RE: SAG-FO-02 follow-up: Durations

<Mike Kay>
So why don't we have special data types for lengths, weights, voltages, 
and temperatures? None of this reasoning convinces me that durations are 
any different from other units of measure, or that there is a good case 
for treating them differently.
</Mike Kay>

Mike,
Here's one reason: XML Schema provides built-in types for dates, times, 
and durations. It does not provide built-in types for lengths, weights, 
voltages, or temperatures. It is reasonable for a query language built 
around the type system of XML Schema to provide operations on the built-in 
Schema types and not on other types. It makes sense to subtract two dates, 
or to add a duration to a time. XQuery and XPath-2 reasonably support 
these operations, as users would expect.

Here's a second reason: SQL, which also has built-in types for dates, 
times, and durations, has supported arithmetic operations on these types 
for a quarter century. There is a lot of experience with date/time 
arithmetic in the database community, and there is a lot of stored data 
that uses dates, times, and durations. I do not think that users will take 
kindly to a new query language that considers the difference between two 
dates to be a floating point number of seconds. I think that XQuery and 
XPath-2 should provide support for dates and times that is similar to that 
provided by other query languages.

Here's a third reason: Your proposal in
http://lists.w3.org/Archives/Public/public-qt-comments/2003Sep/0114.html
contains a function named op:add-seconds-to-date($arg1 as xs:date, $arg2 
as xs:double).
I assume from the "op:" prefix that this function implements the "+" 
operator between the types xs:date and xs:double. But I am very suspicious 
of the expression xs:date("2003-11-30") + 47. That looks like a type error 
to me and I would expect my system to warn me about it. If I understand 
your proposal, this expression is valid and returns the date 2003-11-30 
since 47 seconds is less than one day. This kind of thing seems dangerous 
to me. I think we can do better.

Regards,
--Don Chamberlin




"Kay, Michael" <Michael.Kay@softwareag.com> 
Sent by: w3c-xml-query-wg-request@w3.org
12/01/2003 07:33 AM

To
Jim Melton <jim.melton@oracle.com>, public-qt-comments@w3.org
cc
w3c-xml-query-wg@w3.org, w3c-query-operators@w3.org
Subject
RE: SAG-FO-02 follow-up: Durations







This is a response to Jim Melton's response to Software AG's proposal to 
eliminate many of the functions and operators on durations, relying 
instead on numeric arithmetic.
JM> I think it's fair to summarize Mike's proposal in [1] thus: eliminate 
all uses of the xdt:yearMonthDuration and xdt:dayTimeDuration types, as 
well as the types themselves, and replace them with operations involving 
only the existing xs:duration type.  There are, of course, a number of 
details in the proposal in support of the high-level goal. 
JM> Now, my principle reason for objecting to this direction is based on 
the complexity that Mike rightfully deplores.  He proposes that we 
simplify the specifications for various documents in the XQuery/XPath 
suite and the lives if the WG members working on those documents. 
Unfortunately, my two decades of experience in this area has convinced me 
that this has the unfortunate side effect of saddling the users (the query 
writers, that is) with solving the problems that Mike observes to be so 
complicated in the XQuery/XPath documents.
MK> I don't think this is true. I think that very many of the functions 
and operators we are providing are straight duplicates of what you could 
do more simply with regular arithmetic.
Certainly, adding, multiplying, comparing, and sorting durations expressed 
as a number of seconds or as a number of months is not difficult. By 
asking users to do these operations in the same way as they manipulate 
lengths, weights, or numbers of paperclips, we are making their lives 
easier, not harder.
The only things that are special and difficult about durations are (a) the 
interaction between month-durations and second-durations (which is a 
problem we have washed our hands of), and (b) computations that combine 
dates (and times) with durations, where the Software AG proposal retains 
the functionality of the current specification.
JM> I am unwilling to require tens of thousands of query writers to figure 
out how to deal with an issue that was deemed unsolvable by the 30 or 40 
people in the world who are most expert in the subject of XQuery and 
XPath.  If we, the members of the XQuery WG, have so much difficulty 
getting this right, how many thousands of errors, subtle and devious, will 
be produced by query authors who are trying to solve their business 
problems but are forced to solve data type problems instead?
MK> I would like to see concrete examples. I think that getting rid of all 
these functions makes users' lives easier. For example, it's not at all 
obvious to the casual reader that the operation to multiply a duration by 
a number is not what they need when calculating how much to pay someone 
who has worked for four hours at a rate of $10 per hour. In fact, I think 
it's quite hard for that user to wade through our list of dozens of 
functions and discover that none of them meets this need. If the duration 
was represented by a number, like every other quantity, then the 
difficulty would not arise.
JM> In many ways, Mike's proposal is attractive.  On the surface, it 
appears to simplify a lot of things.  For example, it removes a large 
number of functions from the F&O specification [3] (as well as removing a 
number of lines from various tables).  As somebody employed by an XQuery 
implementor, I am concerned (as is, of course, Mike) with having such a 
large number of functions to specify, implement, test, and document. 
However, I much more concerned with having to deal with what will be (in 
my opinion) a significant number of support calls related to user-written 
bugs that might be avoided by supporting durations natively in XQuery. 
JM> It is true that an *implementation* of XQuery might choose to 
represent a value of type xdt:yearMonthDuration as an integer number of 
months and a value of type xdt:dayTimeDuration as a decimal number of 
seconds.  And it is also true that an *implementation* of XQuery (and of 
XML Schema) might choose to represent a value of type xs:duration as a 
2-tuple containing the number of months and the number of seconds.  But I 
disagree that such a representation is the right paradigm to present to 
our users.  In fact, the XML Schema WG is believed to be considering the 
xdt:yearMonthDuration and xdt:dayTimeDuration types for inclusion in a 
future version of the XML Schema Recommendation.  To me, that suggests 
that data type experts are concerned with the practical use of 
xs:duration.  (Full disclosure: I have argued that the XML Schema WG adopt 
the xdt: types related to durations for reasons closely related to my 
positions in this message.)
JM> I believe that the reality of the situation is that durations are a 
useful type, one whose values have particular semantics that should be 
observed on their own, without being manipulated in some other surrogate 
form.  Yes, of course it is possible to use numerics as a surrogate for 
durations --- and for dateTime value as well.  It's possible to do so with 
character strings, too.  (The proof is that all computers do exactly this 
internally...the one, two, or more 8-bit bytes used to represent a 
character is nothing but a binary number.)  But characters and character 
strings are sufficiently useful that nobody seriously proposes that they 
be handled as numbers.  Of course, character strings are pretty well 
understood by most people (although Unicode has offered some challenges to 
some of those people), while durations are inherently more complex.
MK> So why don't we have special data types for lengths, weights, 
voltages, and temperatures? None of this reasoning convinces me that 
durations are any different from other units of measure, or that there is 
a good case for treating them differently. If we did really clever things 
with "complex" durations (those that mix months and seconds), then there 
might be some argument. But we've decided not to handle complex durations, 
and only handle those that map trivially to numbers: which means, as I see 
it, that our data types are not adding any value. 
JM> Let's look at a detail or two.  In [1], Mike uses this example: 
A dateTime with lexical representation 1999-05-31T13:20:00-05:00 
has a value represented by the tuple (1999-05-31T18:20:00Z, -18000) 
JM> I consider myself fairly adept at mental arithmetic, but I have to 
pause for awhile to figure out that "-18000" means "Eastern Standard Time" 
in the USA, Canada, and probably other locations --- in other words, that 
it means "-05:00", or "5 timezones to the west of UTC".  In fairness, Mike 
has not proposed that the lexical representation of the time zone 
component of an xs:dateTime be changed.  But he does propose that the 
*value space* representation be changed to a number of seconds, and it is 
the value space representation that application/query writers have to 
manipulate.
MK> Well, there is at least one country in the world that still uses 
non-metric units for lengths. If you're going to manipulate lengths in 
feet and inches, then you're going to have to get used to doing a lot of 
arithmetic to convert between different representations of a length. I ask 
the question again: why are durations different? And if you don't like 
using constants like -18000, it's easy to write them as variables 
($TZ:EDT, for example) or as expressions (for example: tz(-5)). 
JM> Mike also proposes that: 
The elapsed time between two dates, times, or dateTimes is 
generally handled as a number of seconds, expressed as an xs:double. Some 
functions are also provided that manipulate a duration as an integer 
number of months. All arithmetic, comparison, and sorting of durations is 
achieved by expressing the duration as an xs:integer number of months 
plus an xs:double number of seconds, and then manipulating these values 
using conventional numeric arithmetic. 
JM> As I said above, this is a perfectly reasonable mechanism for 
implementations to use internally.  But I believe that it is generally 
awkward and confusing to have humans manipulate this notation for a data 
type that inherently has structure to it.  Quick now, what is the 
difference between 11:00 and 17:00?  Six hours, right?  Sure, that's the 
same as 21600 seconds, but why is that important in this particular 
calculation?
MK> Well, having calculated the difference, what are you going to do with 
it? If you want to display it as "six hours", then I would suggest that 
translating "PT6H" to "six hours" is likely to be just as difficult as 
translating "21600". If you want to do anything with the result other than 
display it (for example, to calculate the average speed over your journey, 
or to plot it on a graph in SVG), then the value 21600 is much more useful 
to you than the value PT6H.
JM> Worse, imagine having to write a query that asks "What is the sum of 3 
hours 45 minutes and 4 hours 15 minutes?"  Should a query writer have to 
first translate each of the values into a numeric value, then perform the 
addition, then transform the result back into a duration?  Or would the 
query writer be better off with merely adding the two values together? 
Consider the following examples (which I have sincerely attempted to write 
fairly).  Assume that an XQuery variable $dur1 contains the first value, 3 
hours 45 minutes, and another variable $dur2 contains the second value, 4 
hours 45 minutes.  (In the existing F&O specification, the types of $dur1 
and $dur2 would be xdt:dayTimeDuration; under Mike's proposal, both would 
have the type xs:duration.)
(1) fn:make-duration ( 0, ( fn:get-seconds-from-duration ( $dur1 ) + 
fn:get-seconds-from-duration ( $dur2 ) ) ) 
(2) $dur1 + $dur2 
Given that choice, I think that most query authors would prefer the 
simplicity (and, probably less important, brevity) of example (2).
MK> Firstly, you are assuming that the data starts of as a duration. My 
expectation is that usually, it won't. For a start, I think that it's much 
more common in persistent data to hold absolute dates/times than to hold 
durations; and in many fields of application, where people do hold 
durations, I think they will already be held as numbers. XBRL, for 
example, always holds duration information as a start date and end date; 
while I have seen XML files that compare the performance of different 
software products under different operational conditions, and they 
invariably represent elapsed times as numbers. They typically start life 
as numbers, and I think most people would need a compelling reason to 
translate the number 132.0578 into the duration PT132.0578S; the fact that 
the duration data type exists isn't of itself a reason for using it. 
I agree that your example 2 is easier, and my advice would be that to 
achieve this simplicity, all you need to do is represent your durations as 
numbers, just as you would represent lengths or monetary amounts. 
JM> Again, ask the question "What time will it be 4 hours 30 minutes after 
8:00?".  In this case, $time1 has type xs:time in both examples and 
contains the time 08:00.  $dur1 has either type xs:duration or type 
xdt:dayTimeDuration, both representing 4 hours 30 minutes. 
(3) fn:add-seconds-to-time ( $time1, fn:get-seconds-from-duration ( $dur1 
) ) 
(4) $time1 + $dur1 
Again, I feel that the second example would generally be preferred by most 
query authors. 
MK> I agree that there is a need for special data types to represent dates 
and (probably) times, and that this creates a need for special functions 
to do arithmetic on dates. In this case I am not at all convinced that 
overloading the "+" operator is a good idea. There are two reasons why (4) 
looks more appealing than (3). One reason is that you have chosen to start 
with a duration represented as an xs:duration, whereas I think it will be 
equally common (whether the Software AG proposal is accepted or not) to 
start with a duration represented as a number. (I don't think I have every 
used an interval data type in SQL; I have always used numbers.) The second 
reason is that our operators are generally one character while our 
function names can be 28 characters or more. If we started with a numeric 
duration, and used shorter function names, then these two examples might 
read as:
(3') later($time1, $dur1) 
(4') $time1 + seconds($dur1) 
where I think there is no practical difference in usability. 

JM> Now, in [2], Jeni Tennison has correctly observed that F&O fails to 
provide for the obvious operations of 
division of an xdt:yearMonthDuration by another 
xdt:yearMonthDuration and division of an xdt:dayTimeDuration by another 
xdt:dayTimeDuration, as well as subtraction of two dates from each other 
to get a xdt:yearMonthDuration (rather than a xdt:dayTimeDuration)? 
That omission is certainly something that could be corrected (even though 
I note in passing that the SQL standard does not provide for division of 
one duration/INTERVAL by another). 
MK> In my view there's an infinite number of functions on durations that 
can only be achieved by converting them to numbers. Patching up the gaps 
is pointless, there will always be more. 

JM> But I am not so comfortable with Mike's need to compute average speeds 
by dividing a number by a duration --- even though that is made easier by 
his proposal and the equivalent using existing F&O capabilities is rather 
tedious. By the way, I think the two approaches to solving this problem 
would look something like this.  In these examples, $dur1 is either an 
xdt:dayTimeDuration or an xs:duration, and $dist is an xs:double. 
(5) $dist div fn:get-seconds-from-duration ( $dur1 ) 
(6) $dist div ( ( ( fn:get-days-from-dayTimeDuration ( $dur1 ) * 24 + 
                    fn:get-hours-from-dayTimeDuration ( $dur1 ) ) * 60 + 
                    fn:get-minutes-from-dayTimeDuration ( $dur1 ) ) * 60 + 

                    fn:get-seconds-from-dayTimeDuration ( $dur1 ) ) * 60 
It's very clear that Mike's alternative is much shorter and easier to 
understand.  However, one must ask whether this is a typical question that 
will be asked using durations.  In some business, it will be; in others, 
it will not be.  I suggest that, in businesses where such a question is 
common, it is easy enough to write a user-defined function 
(my:convert-dayTimeDuration-to-seconds, for example) to be used by every 
query needing this sort of computation, which would reduce example (6) to:
(7) $dist div my:convert-dayTimeDuration-to-seconds ( $dur1 ) 
And (7) is not significantly different from (5). 
MK> I actually started with a real-life problem of comparing performance 
measurements on software products, and decided to translate it to 
something that looked less parochial.
JM> We come now to my final source of discomfort, and this is one that I 
do not believe can be papered over by workarounds.  In Mike's proposal, he 
deletes the two newly-defined subtypes of xs:duration, 
xdt:yearMonthDuration and xdt:dayTimeDuration.  A prime reason for the 
creation of those two subtypes is that each of them is "totally ordered", 
meaning that one can compare two values of one type and unambiguously 
determine whether the first is greater than the second, or less than it, 
or equal to it.  By contrast, one cannot do that with values of the 
xs:duration type in the general case.  I believe that the same deficiency 
would result from Mike's proposal, leaving the user to deal with the 
consequences. 
MK> Yes. Our two data types do nothing to solve the problem of managing 
complex durations. They try to wish the problem away. One of the 
fundamental insights that came from my colleagues in Darmstadt was that 
the justification for duration data types is because durations are 
difficult, but we have restricted ourselves to handling only those 
durations that are easy, in which case the justification for having 
special data types disappears.
JM> Consider two values of type xs:duration under Mike's proposal.  One of 
these, $dur1, is (effectively) constructed from fn:make-duration(2, 0) and 
the other, $dur2, from fn:make-duration(1, 30*24*60*60).  What is the 
result of comparing $dur1 and $dur2?  It is not possible to determine this 
unambiguously.  The reason is simple: We do not know how many days there 
are in a month!  Some months have 31 days, others have either 28 or 29 
days (depending on the year involved).  Therefore, the expression 
"30*24*60*60" may or may not equal 1 month.  Consequently, the type 
xs:duration is *not* totally ordered.  That means, of course, that some 
pairs of values, such as fn:make-duration(2, 0) and $dur2, from 
fn:make-duration(1, 1*24*60*60), can be compared without ambiguity (2 
months is greater than 1 month and 1 day), while others cannot. 
The existing F&O specification that introduces and uses the two xdt: types 
avoid the mixing of year-month durations and day-time durations 
specifically because of this problem.  In Mike's proposal, he suggests (in 
changes to Annex C.5) that one could define comparison of two values of 
type xs:duration by comparing their months values and also comparing their 
seconds values and require that both must be equal in order for the result 
of the comparison to indicate equality. 
But he admits that greater than and less than comparisons are more 
problematic because of the partially-ordered nature of xs:duration.  He 
suggests a pragmatic solution that uses 
365.242199 as the average 
number of days in a year.  That works well when the values involves 
are intended for statistical use (e.g., over thousands, probably even 
hundreds, of years), but they don't work nearly as well for computations 
involving one year or two years or 1 month! 
MK> The key point of my proposal in this area is that mixed durations are 
indeed problematic, and that our two new data types do nothing to help the 
user in handling them, and because they don't help with the problem, they 
serve no useful purpose. By contrast, if complex durations are represented 
as a pair of numbers, there are various algorithms the user could employ 
to make them tractable, some of which might meet the needs of some 
applications. If you've got a pair of numbers, the problem is probably 
easier to deal with than if you've got a (xdt:yearMonthDuration, 
xdt:dayMonthDuration) pair.

JM> In short (in case I have not been clear), I cannot support, and will 
oppose, acceptance of this proposal on the grounds that it quite often 
makes the application/query authors' lives more difficult, raises the 
probability of errors that will result in support costs borne by vendors, 
and loses the safety provided by the xdt: types, all in the goal of 
simplifying the lives of the authors of the XQuery/XPath suite of 
documents. 

MK> I emphatically do NOT try to justify the proposal on the basis that it 
makes the lives of implementors and specifiers easier. The key point about 
the proposal is that it makes the language significantly smaller without 
removing any useful functionality, and that this actually makes the users' 
lives easier.
Thanks for giving the proposal such careful attention! 
Michael Kay 

Received on Tuesday, 2 December 2003 13:45:31 UTC