XML Schema datatypes: NaN, lists of union types, [NEL], miscella neous editorial

I've finally had a chance to review the recent datatypes draft (really the 16th March one, but I believe the comments apply equally to the 30 March draft).  I'll try to list the issues in order of
significance, but I definitely believe the first few are essential to be addressed (or at least adequately explained)

1. Treatment of NaN

In the definition for float and double, the following sentence appears:

"Not-a-number equals itself and is greater than all float values including positive infinity."

This treatment of NaN appears to be incompatible with the treatment of NaN in the IEEE spec and in Java and was introduced into this last draft without any explanation.

For example, from the Java Language Specification: http://java.sun.com/docs/books/jls/second_edition/html/typesValues.doc.html#9290

"NaN is unordered, so the numerical comparison operators <, <=, >, and >= return false if either or both operands are NaN (§15.20.1). The equality operator == returns false if either operand is NaN,
and the inequality operator != returns true if either operand is NaN (§15.21.1). In particular, x!=x is true if and only if x is NaN, and (x<y) == !(x>=y) will be false if x or y is NaN."

If this conceit has substantial value (like allowing double to be ordered), then it should be appropriately explained and the limits to how far its abnormal treatment of NaN needs to propagate into
other XML specs.

I am not aware of the motivation for this treatment, however if the motivation is to force double and float to be fully ordered and support the equality concept (which traditional NaN treatment
violates since NaN != NaN), then I would suggest removing NaN from the lexical and value space of double and float, adding a distinct NaN type (the first datatype whose value space does not support
the notion of equality) and adding unions of double and NaN and float and NaN as built-in datatypes.


2. Barring lists of union types

Section 2.5.1.2 repeatedly defines list datatypes as lists of atomic datatypes (as opposed to union datatypes or list datatypes).  Section 2.5.1.3 explicitly allows union datatypes to have members
that are either atomic or list datatypes.  I assume that union datatypes are excluded from lists to prevent indirectly allowing lists of lists.

I would suggest that the value of allowing lists of union types is substantially greater than allowing unions that include list types.  Unions of union types would also be significantly valuable.

I would recommend that:

Any number of atomic or union datatypes can participate in a union datatype.  No list types can participate in a union.  That "atomic or union" replace "atomic" in section 2.5.1.2.

3. Time duration

I would reiterate that I do not believe that the need for comparing durations that mix precise and imprecise durations is legitimate and does not justify the complexity inherent in the timeDuration
type as defined.  I believe that defining timeDuration as a union of a timeDurationMonths and timeDurationSeconds would be sufficient for all practical uses of timeDuration.  timeDurationMonths and
timeDurationSeconds would each be fully ordered and comparisons between members of those types are straightforward.  (http://lists.w3.org/Archives/Public/www-xml-schema-comments/2001JanMar/0218.html,
sorry about the formatting)

In a similar vein, it is looking very appealing to me to split all the time related times into qualified (with Z or time zone offset) and unqualified forms (without) and defining unions for the
generic version.  If you want to specify ranges through min and max constraints, you would be required to pick either the qualified or unqualified form to extend and all the discussion on comparing
qualified and unqualified forms could be excised.

4. Time zone offset

I would suggest explicitly stating limits for the acceptable values for the time zone offset, specifically at least limiting the values between -24:00 and +24:00 (or maybe +/-23:59) instead of the
+99:59 or -99:59 allowed now.  This does not affect the value space for dateTime since time zone offset is strictly a formatting issue.  However, it would minimize the potential abuse of gMonth,
gYear, etc to be used to represent a duration that does not correspond a Gregorian Month, Year etc in some recognized time zone in the world.

5. Lack of canonical form for hexBinary

hexBinary would allow either 0FB7, 0fb7, 0Fb7, or 0fB7 for the 16-bit integer 4023.  I would recommend that use of the upper case A-F be the canonical form.

6. Cyclical types (time, gMonthDay, gDay, gMonth)

After some reflection, it appears to me that the complexity of recognizing the cyclical nature of these times in the evaluation of min and max constraints becomes untenably complex especially when
dealing with time zone offsets and the irregularity of the lengths of months.  See http://lists.w3.org/Archives/Public/www-xml-schema-comments/2001AprJun/0009.html
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2001AprJun/0010.html  It would appear that the schema author could recognize the cyclical nature of the underlying datatype and create an
appropriate union type to represent, for example, business hours in Japan by creating a union type of those hours between the start of business and 00:00:00Z and between 00:00:00Z and end of business.
However, this would support the previous suggestion to allow lists to be composed of union and atomic datatypes.

Editoral comments:

Section 2.4.1.1: Equal

If something appropriate is done with NaN, the "every value space supports the notion of equality" will no longer hold.

"Note that a consequence of the above is that, given value space A..."

I'm not sure what the consequences of this statement in referencing specs would be, for example, for the values corresponding to the literal "3" in the double, float, and integer value spaces.

"for all a and b in the value space, a < b and b < a implies a=b"

I don't get this one, what would be an example of a value a and b where a < b and b < a.

Section 2.4.2.6: Whitespace

replace: This section explicits lists tab, line feed and carriage return.  Should the language be more inclusive to include additional whitespace characters added to XML as proposed in
http://www.w3.org/TR/newline or explicitly allow NEL.

"For all atomic datatypes other than string..."

This seems to contradict the note in section 3.2.17.1 that mentions that spaces are allowed in the anyURI lexical space, but discouraged.

Whitespace is listed as a constraining facet in the tables for most datatypes (such as boolean, number) even though it would be an error to specify anything other than the value implied by the base
type.  Maybe it would be helpful to display it as 

whiteSpace [must be 'collapse']

or similar.

Section 3.2.6: timeDuration

If not radically changed per previous recommendations:

"six-dimension space": only two are needed gregorian months and seconds.

"The values of of Year, Month, Day, Hour and Minutes components are not restricted by allow an arbitrary integer": non-negative integer.  Next sentence, non-negative decimal.

"The lowest order item may have a decimal fraction"  This is inconsistent with Y, M, D, H and M only being integers in the previous section.  Only seconds should accept decimals.

Section 3.2.8: time

"Since the lexical representation allows an optional time zone indicator... it may not be able to determine the order of two values one of which has a time zone and the other does not".  I can't see a
case when it ever would be possible to make a comparision between a qualified time and an unqualified time since the potential time zones variations are greater than 24 hours.

Section 3.2.9 and similar:

"If date values are considered periods of time, the order relation on date values is the order relation on their starting instants."

This sentence appears with only a change of the type name in several types.  My qualm is with the "If", I do not believe the sentence should be conditional: date values are periods of time and the
order relation is based on their starting instants.  If not, then you must have some mechanism to know when they are not and know what order relation should be used in that circumstance.

Received on Wednesday, 4 April 2001 16:38:57 UTC