RE: XQuery semantics: aggregations from Peter McIlroy on 2001-11-05 (www-xml-query-comments@w3.org from November 2001)

From: Peter McIlroy <PeterM@nimble.com>
Date: Mon, 5 Nov 2001 15:35:08 -0800
To: "'www-xml-query-comments@w3.org'" <www-xml-query-comments@w3.org>
Cc: Peter McIlroy <PeterM@nimble.com>, Denise Draper <ddraper@nimble.com>, "'simeon@research.bell-labs.com'" <simeon@research.bell-labs.com>
Message-ID: <6514DE680737F449885673E7895ED25001211B13@zeus.nimble.com>

I'm forwarding this to the newsgroup on the advice of Jerome Simeon.

The definition of XML arithmetic and aggregate functions in the XQuery
proposal
is troubling.  In particular, the current specifications are for:

 double SUM [ T ]
 double AVG [ T ]
for all types T.

After having my head straightened by the QL compiler people here at Nimble,
I believe that
the best model for aggregate functions and operators is that all aggregates
and functions
will have the same type as their arguments:

	T SUM [ T ]
	T AVG [ T ]
	T MIN [ T ]  (as it stands)
	T MAX [ T ]  (as it stands)

		

There are several reasons for this:

(1) All aggregate functions are treated uniformly with other numeric
operators.
(2) It is SQL compatible, so it satisfies the principle of least surprise.
(3) For DECIMAL types, the domain agrees with the range.  The current
proposal,
that avg and sum always be type double, may cause serious roundoff errors
for
DECIMAL numbers.  This may cause some concern in the financial industry.
(4) It is a general model: if the result is desired in some other type, it
can be requested explicitly, as in SQL:

CREATE TABLE test(fld1 INTEGER);
INSERT INTO test VALUES(...);
SELECT AVG(real(fld1)) from test;


Additionally, the proposed rounding methods may prove insufficiently
general:
The proposal is for:
	INT CEILING(T);
	INT FLOOR(T);
	INT ROUND(T);

The SQL standard is more like:

	T CEILING(T), T is DOUBLE or DECIMAL (to avoid overflow)
	T FLOOR(T), T is DOUBLE or DECIMAL
	T ROUND(T, int precision) -- T is any numeric type
				precision is the position at which to round.
Default is 0.
				For integers, only negative precision has
any effect.


You may want to include TRUNC in this list, or to follow a model more like
IEEE:

	T ROUND(T arg, int precision = 0, string direction = 'nearest')
		where direction can be
			'negative' (towards -Inf)
			'positive' (towards +Inf)
			'tozero' (truncate towards zero)
			'nearest' (round to nearest, with .5 treated some
uniform manner, either
				always away from zero, towards zero, or IEEE
towards even. The current model,
				towards positive infinity, does not agree
with)

		See: "man fpsetround" on solaris for more details of IEEE
rounding.


Sincerely, 
	Peter McIlroy

pmcilroy@nimble.com


Peter McIlroy writes:
 > Thanks.
 > 
 > There's still some problem.
 > 
 > I don't think that the 
 > 
 > SUM [ DECIMAL ] --> double
 > or
 > AVG [ DECIMAL ] --> double
 > 
 > is a safe coercion in types.
 > 
 > For example, AVG[.2, .2] = .2, not some floating point approximation to
2.
 > 
 > Also, if you treat SUM [DECIMAL] as a floating point, the entire
financial
 > database community will be unhappy.
 > 
 > 
 > I've been talking with the XMLQL compiler people here, who have been
working
 > on ways to make xml-based views on disparate data sources.
 > 
 > They say that the best way to go is that the aggregation functions
 > take the same type as their arguments.
 > 
 > They recommend that all functions be templatized as follows:
 > 
 > <T> T MAX [ T ]
 > <T> T SUM [ T ]
 > <T> T AVG [ T ]
 > <T> T SUM [ T ]
  > 
 > There's more flexibility and soimplicity in making the functions
 > polymorphic,
 > than in requiring them to have only one return type.
 > Then if you do want to compute a sum of integers as a double, you do
 > the cast on the column value, not on the result:
 > 
 > SUM [ double(integer-column) ]
 > 
 > Also,  I am pleased to see that you are proposing to use david gay's
 > improved
 > version of the Steele & White stopping criteria for conversion of
 > floating point numbers to decimal.  It really is the only right solution
 > for this problem.
 > 
 > 
 > -----Original Message-----
 > From: Jerome Simeon [mailto:simeon@research.bell-labs.com]
 > Sent: Friday, November 02, 2001 5:29 PM
 > To: Peter McIlroy
 > Cc: 'simeon@research.bell-labs.com'
 > Subject: Re: XQuery semantics: aggregations
 > 
 > 
 > 
 > Hi Peter,
 > 
 > Most of the arithmetic operations are now defined as a part of the
 > Functions and Operators document for XQuery:
 > 
 > XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0
 > http://www.w3.org/TR/xquery-operators/
 > 
 > Which means thoses from the semantics document should be probably
 > revisited.
 > 
 > Let me know if the F&O document addresses your issues.
 > 
 > Regards,
 > - Jerome
 >

Received on Monday, 5 November 2001 18:34:28 UTC