RE: XML Schema datatypes: NaN, lists of union types, [NEL], miscella neous editorial

Curt:
Thank you for your comments.  I have responded to some of them 
by adding remarks prefixed with AM>> in your text below.
Some of your comments need more thought.  Some we should consider for
v1.1.
Ashok

-----Original Message-----
From: Arnold, Curt [mailto:Curt.Arnold@hyprotech.com]
Sent: Wednesday, April 04, 2001 1:36 PM
To: 'www-xml-schema-comments@w3.org'
Cc: 'xml-dev@lists.xml.org'; 'malaika@us.ibm.com'
Subject: XML Schema datatypes: NaN, lists of union types, [NEL],
miscella neous editorial


I've finally had a chance to review the recent datatypes draft (really
the 16th March one, but I believe the comments apply equally to the 30
March draft).  I'll try to list the issues in order of
significance, but I definitely believe the first few are essential to be
addressed (or at least adequately explained)

1. Treatment of NaN

In the definition for float and double, the following sentence appears:

"Not-a-number equals itself and is greater than all float values
including positive infinity."

This treatment of NaN appears to be incompatible with the treatment of
NaN in the IEEE spec and in Java and was introduced into this last draft
without any explanation.

For example, from the Java Language Specification:
http://java.sun.com/docs/books/jls/second_edition/html/typesValues.doc.h
tml#9290

"NaN is unordered, so the numerical comparison operators <, <=, >, and
>= return false if either or both operands are NaN (§15.20.1). The
equality operator == returns false if either operand is NaN,
and the inequality operator != returns true if either operand is NaN
(§15.21.1). In particular, x!=x is true if and only if x is NaN, and
(x<y) == !(x>=y) will be false if x or y is NaN."

If this conceit has substantial value (like allowing double to be
ordered), then it should be appropriately explained and the limits to
how far its abnormal treatment of NaN needs to propagate into
other XML specs.

I am not aware of the motivation for this treatment, however if the
motivation is to force double and float to be fully ordered and support
the equality concept (which traditional NaN treatment
violates since NaN != NaN), then I would suggest removing NaN from the
lexical and value space of double and float, adding a distinct NaN type
(the first datatype whose value space does not support
the notion of equality) and adding unions of double and NaN and float
and NaN as built-in datatypes.
AM>> We were persuaded to use what the Java implementations do and not
AM>> what the language spec says.  NaN is not a problem for Schema since
I doubt that
AM>> anyone will use NaN as a facet value but it is a problem for Query
etc.
AM>> A bigger problem is +0 != -0


2. Barring lists of union types

Section 2.5.1.2 repeatedly defines list datatypes as lists of atomic
datatypes (as opposed to union datatypes or list datatypes).  Section
2.5.1.3 explicitly allows union datatypes to have members
that are either atomic or list datatypes.  I assume that union datatypes
are excluded from lists to prevent indirectly allowing lists of lists.

I would suggest that the value of allowing lists of union types is
substantially greater than allowing unions that include list types.
Unions of union types would also be significantly valuable.

I would recommend that:

Any number of atomic or union datatypes can participate in a union
datatype.  No list types can participate in a union.  That "atomic or
union" replace "atomic" in section 2.5.1.2.

AM>> I believe this is a bug that has since been fixed.

3. Time duration

I would reiterate that I do not believe that the need for comparing
durations that mix precise and imprecise durations is legitimate and
does not justify the complexity inherent in the timeDuration
type as defined.  I believe that defining timeDuration as a union of a
timeDurationMonths and timeDurationSeconds would be sufficient for all
practical uses of timeDuration.  timeDurationMonths and
timeDurationSeconds would each be fully ordered and comparisons between
members of those types are straightforward.
(http://lists.w3.org/Archives/Public/www-xml-schema-comments/2001JanMar/
0218.html,
sorry about the formatting)

In a similar vein, it is looking very appealing to me to split all the
time related times into qualified (with Z or time zone offset) and
unqualified forms (without) and defining unions for the
generic version.  If you want to specify ranges through min and max
constraints, you would be required to pick either the qualified or
unqualified form to extend and all the discussion on comparing
qualified and unqualified forms could be excised.

AM>> Something to consider for v1.1

4. Time zone offset

I would suggest explicitly stating limits for the acceptable values for
the time zone offset, specifically at least limiting the values between
-24:00 and +24:00 (or maybe +/-23:59) instead of the
+99:59 or -99:59 allowed now.  This does not affect the value space for
dateTime since time zone offset is strictly a formatting issue.
However, it would minimize the potential abuse of gMonth,
gYear, etc to be used to represent a duration that does not correspond a
Gregorian Month, Year etc in some recognized time zone in the world.

AM>> Again, something to consider for v1.1

5. Lack of canonical form for hexBinary

hexBinary would allow either 0FB7, 0fb7, 0Fb7, or 0fB7 for the 16-bit
integer 4023.  I would recommend that use of the upper case A-F be the
canonical form.

6. Cyclical types (time, gMonthDay, gDay, gMonth)

After some reflection, it appears to me that the complexity of
recognizing the cyclical nature of these times in the evaluation of min
and max constraints becomes untenably complex especially when
dealing with time zone offsets and the irregularity of the lengths of
months.  See
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2001AprJun/0
009.html
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2001AprJun/0
010.html  It would appear that the schema author could recognize the
cyclical nature of the underlying datatype and create an
appropriate union type to represent, for example, business hours in
Japan by creating a union type of those hours between the start of
business and 00:00:00Z and between 00:00:00Z and end of business.
However, this would support the previous suggestion to allow lists to be
composed of union and atomic datatypes.

Editoral comments:

Section 2.4.1.1: Equal

If something appropriate is done with NaN, the "every value space
supports the notion of equality" will no longer hold.

"Note that a consequence of the above is that, given value space A..."

I'm not sure what the consequences of this statement in referencing
specs would be, for example, for the values corresponding to the literal
"3" in the double, float, and integer value spaces.

"for all a and b in the value space, a < b and b < a implies a=b"

I don't get this one, what would be an example of a value a and b where
a < b and b < a.

AM>> We have been discussing some revised wording for this section.

Section 2.4.2.6: Whitespace

replace: This section explicits lists tab, line feed and carriage
return.  Should the language be more inclusive to include additional
whitespace characters added to XML as proposed in
http://www.w3.org/TR/newline or explicitly allow NEL.

AM>> We follow XML 1.0

"For all atomic datatypes other than string..."

This seems to contradict the note in section 3.2.17.1 that mentions that
spaces are allowed in the anyURI lexical space, but discouraged.

Whitespace is listed as a constraining facet in the tables for most
datatypes (such as boolean, number) even though it would be an error to
specify anything other than the value implied by the base
type.  Maybe it would be helpful to display it as 

whiteSpace [must be 'collapse']

or similar.

AM>> We'll consider this.

Section 3.2.6: timeDuration

If not radically changed per previous recommendations:

"six-dimension space": only two are needed gregorian months and seconds.

"The values of of Year, Month, Day, Hour and Minutes components are not
restricted by allow an arbitrary integer": non-negative integer.  Next
sentence, non-negative decimal.

"The lowest order item may have a decimal fraction"  This is
inconsistent with Y, M, D, H and M only being integers in the previous
section.  Only seconds should accept decimals.

Section 3.2.8: time

"Since the lexical representation allows an optional time zone
indicator... it may not be able to determine the order of two values one
of which has a time zone and the other does not".  I can't see a
case when it ever would be possible to make a comparision between a
qualified time and an unqualified time since the potential time zones
variations are greater than 24 hours.

AM>> There is a section the draft that gives an algorithm for making
such comparisons.

Section 3.2.9 and similar:

"If date values are considered periods of time, the order relation on
date values is the order relation on their starting instants."

This sentence appears with only a change of the type name in several
types.  My qualm is with the "If", I do not believe the sentence should
be conditional: date values are periods of time and the
order relation is based on their starting instants.  If not, then you
must have some mechanism to know when they are not and know what order
relation should be used in that circumstance.

AM>> This wording has been changed.

Received on Thursday, 5 April 2001 12:50:49 UTC