RE: SAG-FO-02 follow-up: Durations

Gentlepeople,

In this message, I respond to Mike Kay's suggestions and proposal in [1], 
regarding the treatment in XQuery and XPath of durations.

I would like to preface my arguments with the statement that I have the 
greatest respect for Mike Kay, who has contributed immeasurably to the 
XQuery and XPath specifications, to the XSLT specs and one of the most 
widely used implementations of XSLT, and to much else involved in these 
technologies.  I very often agree with Mike's suggestions and proposals 
that clarify, and frequently simplify, the XQuery and XPath specifications 
and languages.  It is rare that I disagree with his directions so 
forcefully as I do here.

While I understand Mike's concern over the complexity of dealing with 
durations (which concern is readily extended to dealing with dates and 
times in general), it is largely that very complexity that causes me 
discomfort with the overall direction of his proposed solution.

I think it's fair to summarize Mike's proposal in [1] thus: eliminate all 
uses of the xdt:yearMonthDuration and xdt:dayTimeDuration types, as well as 
the types themselves, and replace them with operations involving only the 
existing xs:duration type.  There are, of course, a number of details in 
the proposal in support of the high-level goal.

Now, my principle reason for objecting to this direction is based on the 
complexity that Mike rightfully deplores.  He proposes that we simplify the 
specifications for various documents in the XQuery/XPath suite and the 
lives if the WG members working on those documents.  Unfortunately, my two 
decades of experience in this area has convinced me that this has the 
unfortunate side effect of saddling the users (the query writers, that is) 
with solving the problems that Mike observes to be so complicated in the 
XQuery/XPath documents.  I am unwilling to require tens of thousands of 
query writers to figure out how to deal with an issue that was deemed 
unsolvable by the 30 or 40 people in the world who are most expert in the 
subject of XQuery and XPath.  If we, the members of the XQuery WG, have so 
much difficulty getting this right, how many thousands of errors, subtle 
and devious, will be produced by query authors who are trying to solve 
their business problems but are forced to solve data type problems instead?

In many ways, Mike's proposal is attractive.  On the surface, it appears to 
simplify a lot of things.  For example, it removes a large number of 
functions from the F&O specification [3] (as well as removing a number of 
lines from various tables).  As somebody employed by an XQuery implementor, 
I am concerned (as is, of course, Mike) with having such a large number of 
functions to specify, implement, test, and document.  However, I much more 
concerned with having to deal with what will be (in my opinion) a 
significant number of support calls related to user-written bugs that might 
be avoided by supporting durations natively in XQuery.

It is true that an *implementation* of XQuery might choose to represent a 
value of type xdt:yearMonthDuration as an integer number of months and a 
value of type xdt:dayTimeDuration as a decimal number of seconds.  And it 
is also true that an *implementation* of XQuery (and of XML Schema) might 
choose to represent a value of type xs:duration as a 2-tuple containing the 
number of months and the number of seconds.  But I disagree that such a 
representation is the right paradigm to present to our users.  In fact, the 
XML Schema WG is believed to be considering the xdt:yearMonthDuration and 
xdt:dayTimeDuration types for inclusion in a future version of the XML 
Schema Recommendation.  To me, that suggests that data type experts are 
concerned with the practical use of xs:duration.  (Full disclosure: I have 
argued that the XML Schema WG adopt the xdt: types related to durations for 
reasons closely related to my positions in this message.)

I believe that the reality of the situation is that durations are a useful 
type, one whose values have particular semantics that should be observed on 
their own, without being manipulated in some other surrogate form.  Yes, of 
course it is possible to use numerics as a surrogate for durations --- and 
for dateTime value as well.  It's possible to do so with character strings, 
too.  (The proof is that all computers do exactly this internally...the 
one, two, or more 8-bit bytes used to represent a character is nothing but 
a binary number.)  But characters and character strings are sufficiently 
useful that nobody seriously proposes that they be handled as numbers.  Of 
course, character strings are pretty well understood by most people 
(although Unicode has offered some challenges to some of those people), 
while durations are inherently more complex.

Let's look at a detail or two.  In [1], Mike uses this example:
A dateTime with lexical representation 1999-05-31T13:20:00-05:00 has a 
value represented by the tuple (1999-05-31T18:20:00Z, -18000)
I consider myself fairly adept at mental arithmetic, but I have to pause 
for awhile to figure out that "-18000" means "Eastern Standard Time" in the 
USA, Canada, and probably other locations --- in other words, that it means 
"-05:00", or "5 timezones to the west of UTC".  In fairness, Mike has not 
proposed that the lexical representation of the time zone component of an 
xs:dateTime be changed.  But he does propose that the *value space* 
representation be changed to a number of seconds, and it is the value space 
representation that application/query writers have to manipulate.

Mike also proposes that:
The elapsed time between two dates, times, or dateTimes is generally 
handled as a number of seconds, expressed as an xs:double. Some functions 
are also provided that manipulate a duration as an integer number of 
months. All arithmetic, comparison, and sorting of durations is achieved by 
expressing the duration as an xs:integer number of months plus an xs:double 
number of seconds, and then manipulating these values using conventional 
numeric arithmetic.
As I said above, this is a perfectly reasonable mechanism for 
implementations to use internally.  But I believe that it is generally 
awkward and confusing to have humans manipulate this notation for a data 
type that inherently has structure to it.  Quick now, what is the 
difference between 11:00 and 17:00?  Six hours, right?  Sure, that's the 
same as 21600 seconds, but why is that important in this particular 
calculation?
Worse, imagine having to write a query that asks "What is the sum of 3 
hours 45 minutes and 4 hours 15 minutes?"  Should a query writer have to 
first translate each of the values into a numeric value, then perform the 
addition, then transform the result back into a duration?  Or would the 
query writer be better off with merely adding the two values 
together?  Consider the following examples (which I have sincerely 
attempted to write fairly).  Assume that an XQuery variable $dur1 contains 
the first value, 3 hours 45 minutes, and another variable $dur2 contains 
the second value, 4 hours 45 minutes.  (In the existing F&O specification, 
the types of $dur1 and $dur2 would be xdt:dayTimeDuration; under Mike's 
proposal, both would have the type xs:duration.)

(1) fn:make-duration ( 0, ( fn:get-seconds-from-duration ( $dur1 ) + 
fn:get-seconds-from-duration ( $dur2 ) ) )

(2) $dur1 + $dur2

Given that choice, I think that most query authors would prefer the 
simplicity (and, probably less important, brevity) of example (2).

Again, ask the question "What time will it be 4 hours 30 minutes after 
8:00?".  In this case, $time1 has type xs:time in both examples and 
contains the time 08:00.  $dur1 has either type xs:duration or type 
xdt:dayTimeDuration, both representing 4 hours 30 minutes.

(3) fn:add-seconds-to-time ( $time1, fn:get-seconds-from-duration ( $dur1 ) )

(4) $time1 + $dur1

Again, I feel that the second example would generally be preferred by most 
query authors.

Now, in [2], Jeni Tennison has correctly observed that F&O fails to provide 
for the obvious operations of
division of an xdt:yearMonthDuration by another xdt:yearMonthDuration and 
division of an xdt:dayTimeDuration by another xdt:dayTimeDuration, as well 
as subtraction of two dates from each other to get a xdt:yearMonthDuration 
(rather than a xdt:dayTimeDuration)?  That omission is certainly something 
that could be corrected (even though I note in passing that the SQL 
standard does not provide for division of one duration/INTERVAL by another).

But I am not so comfortable with Mike's need to compute average speeds by 
dividing a number by a duration --- even though that is made easier by his 
proposal and the equivalent using existing F&O capabilities is rather 
tedious. By the way, I think the two approaches to solving this problem 
would look something like this.  In these examples, $dur1 is either an 
xdt:dayTimeDuration or an xs:duration, and $dist is an xs:double.

(5) $dist div fn:get-seconds-from-duration ( $dur1 )

(6) $dist div ( ( ( fn:get-days-from-dayTimeDuration ( $dur1 ) * 24 +
                     fn:get-hours-from-dayTimeDuration ( $dur1 ) ) * 60 +
                     fn:get-minutes-from-dayTimeDuration ( $dur1 ) ) * 60 +
                     fn:get-seconds-from-dayTimeDuration ( $dur1 ) ) * 60

It's very clear that Mike's alternative is much shorter and easier to 
understand.  However, one must ask whether this is a typical question that 
will be asked using durations.  In some business, it will be; in others, it 
will not be.  I suggest that, in businesses where such a question is 
common, it is easy enough to write a user-defined function 
(my:convert-dayTimeDuration-to-seconds, for example) to be used by every 
query needing this sort of computation, which would reduce example (6) to:

(7) $dist div my:convert-dayTimeDuration-to-seconds ( $dur1 )

And (7) is not significantly different from (5).

We come now to my final source of discomfort, and this is one that I do not 
believe can be papered over by workarounds.  In Mike's proposal, he deletes 
the two newly-defined subtypes of xs:duration, xdt:yearMonthDuration and 
xdt:dayTimeDuration.  A prime reason for the creation of those two subtypes 
is that each of them is "totally ordered", meaning that one can compare two 
values of one type and unambiguously determine whether the first is greater 
than the second, or less than it, or equal to it.  By contrast, one cannot 
do that with values of the xs:duration type in the general case.  I believe 
that the same deficiency would result from Mike's proposal, leaving the 
user to deal with the consequences.

Consider two values of type xs:duration under Mike's proposal.  One of 
these, $dur1, is (effectively) constructed from fn:make-duration(2, 0) and 
the other, $dur2, from fn:make-duration(1, 30*24*60*60).  What is the 
result of comparing $dur1 and $dur2?  It is not possible to determine this 
unambiguously.  The reason is simple: We do not know how many days there 
are in a month!  Some months have 31 days, others have either 28 or 29 days 
(depending on the year involved).  Therefore, the expression "30*24*60*60" 
may or may not equal 1 month.  Consequently, the type xs:duration is *not* 
totally ordered.  That means, of course, that some pairs of values, such as 
fn:make-duration(2, 0) and $dur2, from fn:make-duration(1, 1*24*60*60), can 
be compared without ambiguity (2 months is greater than 1 month and 1 day), 
while others cannot.

The existing F&O specification that introduces and uses the two xdt: types 
avoid the mixing of year-month durations and day-time durations 
specifically because of this problem.  In Mike's proposal, he suggests (in 
changes to Annex C.5) that one could define comparison of two values of 
type xs:duration by comparing their months values and also comparing their 
seconds values and require that both must be equal in order for the result 
of the comparison to indicate equality.

But he admits that greater than and less than comparisons are more 
problematic because of the partially-ordered nature of xs:duration.  He 
suggests a pragmatic solution that uses 365.242199 as the average number of 
days in a year.  That works well when the values involves are intended for 
statistical use (e.g., over thousands, probably even hundreds, of years), 
but they don't work nearly as well for computations involving one year or 
two years or 1 month!

In short (in case I have not been clear), I cannot support, and will 
oppose, acceptance of this proposal on the grounds that it quite often 
makes the application/query authors' lives more difficult, raises the 
probability of errors that will result in support costs borne by vendors, 
and loses the safety provided by the xdt: types, all in the goal of 
simplifying the lives of the authors of the XQuery/XPath suite of documents.

Hope this helps,
    Jim

[1] http://lists.w3.org/Archives/Public/public-qt-comments/2003Sep/0114.html

[2] http://lists.w3.org/Archives/Public/public-qt-comments/2003Sep/0114.html

[3] 
<http://www.w3.org/TR/xpath-functions/>http://www.w3.org/TR/xpath-functions/



========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
Oracle Corporation            Oracle Email: mailto:jim.melton@oracle.com
1930 Viscounti Drive          Standards email: mailto:jim.melton@acm.org
Sandy, UT 84093-1063              Personal email: mailto:jim@melton.name
USA                                                Fax : +1.801.942.3345
========================================================================
=  Facts are facts.  However, any opinions expressed are the opinions  =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
======================================================================== 

Received on Monday, 24 November 2003 02:44:57 UTC