XML Schema Datatypes comments from cotton@ca.ibm.com on 2000-03-10 (www-xml-schema-comments@w3.org from January to March 2000)

From: <cotton@ca.ibm.com>
Date: Thu, 9 Mar 2000 19:54:10 -0500
To: www-xml-schema-comments@w3.org
Message-ID: <8725689E.001C9CD9.00@d53mta04h.boulder.ibm.com>
I have reviewed the XML Schema Datatypes specification and discussed my
comments with Paul Biron (at XTech 2000).  I have provided Paul with my
editorial nits by providing him with a hard copy of my marked up document.

The following are my non-editorial comments:

1. Section 2.4.1.2 Order
This section states "In such cases each datatype will define a different
order relation on the value space".  I do not understand why this must be
done.  Certainly at worst is should say "may define".  Better even would be
to delete the sentence entirely.

2. Section 2.4.2.5 enumeration
This section states "No order or any other relationship is implied ...".
This seems to imply that enumerations are not ordered.  I think this
sentence needs to be reworded to imply that "No further ordering is
implied" since certainly the ordering of the underlying data type must be
inherited.  If not then XML Query will have no means of ordering
enumerations.

3. Section 3.2.1 string
This section states "The ordered property of string is the Unicode
character number sequence." The string data type is the only primitive
datatype that makes an explicit statement about how the ordering relation
(not property) is defined.  I expect the ordering information is missing
from other primitive datatype sections.

4. Section 3.2.1 string
This section states "The ordered property of string is the Unicode
character number sequence."  I wonder why the definition of the string
datatype does not permit a user to define the "collation" to be used?
"Unicode character number sequence" is only one "collation" and is not very
useful.  In addition the specification does not explain why this
"collation" is needed.

XML Query will need to support different collations for the string data
type.  It would be preferable if the collation was defined as part of the
<data type> not as part of the query <predicate>s.  I would recommend you
consider a solution such as one adopted by SQL to permit the type definer
to simply name the collation to be used.  No exact definition of the action
collation needs to be provide since there are several other sources for
this information.

5. Section 3.2.5 decimal
The Note in this section asks "Our design discussions did not reveal
convincing evidence of undue burden because of arbitrary precision decimal
numbers in this design, but we welcome further input from implementors".

I believe that you may want to consider the impact on implementors of a
query language based on this data type that must implement <predicate>s and
arithmetic operators for an "arbitary precision decimal number".  I believe
we will find this to be too expensive and that implementations will in fact
constrain the precision of this data type.  If the XML Schema specification
does not do this then interoperability will be heavily constrained.

I do not accept the argument that XML Schemas needs an arbitrarily precise
decimal datatype just to be able to model the length of names in XML which
are in turn unconstrained in length.

I suggest that the document be modified to state that the maximum precision
for decimal numbers should be an "implementation-defined number not less
than X" where X can be agreed upon by implementors as a practical lower
limit for this amount.   "Implementation-defined" means that a conforming
implementation must state in its conformance statement what the value is.

6. Section 3.3.9 integer
The definition of the lexical representation of the integer datatype does
not correctly reflect that non-significant leading and trailing zeroes
should not be used.  Non-significant zeroes are leading zeroes to the left
of the decimal point or trailing zeroes to the right of the decimal point.
I suggest using this concept in the descriptive material.

7. Section 3.3.22 date
There is no specific definition in this specification of the value ranges
of the CC, YY, MM, and DD parts of a date.  Although this is probably
defined in ISO 8601 it would be preferable if this information was included
directly in this specification.  This comment also applies to the time
datatype.

/paulc

Paul Cotton, DB2 Language Architecture & Standards
IBM Canada Ltd, 17 Eleanor Drive, Nepean, Ontario K2E 6A3
Phone: (613) 225-5445   Fax:  (613) 226-6913
email: cotton@ca.ibm.com
Received on Friday, 10 March 2000 00:12:52 UTC