XQuery value comparisons (Part I - numerics)

I have less time available for this than I'd hoped, so I'm going to present
an extremely pithy look at what I consider the bare, bare essentials of
XQuery-based comparison semantics and not do a full, feature-by-feature
comparison against what we're doing in sparql, except in a few instances.
I'm also going to call this Part I and only look at numerics. Part II (if I
can find the time to do it, and there's interest) will look at strings,
dates, times, and the other remaining XML Schema built-in datatypes.

I'd be grateful for (gentle!) feedback if anyone finds I've made any
egregious errors in the following. 95% of readers find the following
information correct 95% of the time. :-)

Howard

Value vs. General Comparisons
------------------------------------
XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] and general
(=, !=, >, <, >=, <=) comparisons [2]. Value comparisons are used for
comparing singletons; general comparisons have their raison d'etre in the
requirements of XPath and are for comparing sequences. Since sparql doesn't
have sequences, and general comparisons generally devolve into item-by-item
comparisons using value semantics anyway (with a few exceptions I won't go
into at the moment), I'm only going to look at value comparisons here. (I
might look at general comparisons in Part II if there seems to be an
interest; I haven't decided yet whether there is or isn't something useful
to be learned from that topic.)

I'll note that while sparql appears to be using value comparisons (ie,
singletons only), it uses the operator symbol set from general comparisons.
I think it's arguable whether this is good, bad, or indifferent; if we
wanted to be precise, we should probably be using the value-comparisons
symbol set , but since those symbols seem to be fortran-based and thus
likely to be viewed as a wondrous cosmological mystery by anybody under the
age of 50 or so, :-) I see no problem with using the more familiar (=, !=,
>, <, >=, <=) symbols. In the following where I'm talking about value
comparisons specifically, I'll use the proper value-comparison operators
from XQuery so as to (hopefully!) not further confuse the issue.

Atomization
--------------
The first step in doing a comparison is atomization, in which each operand
is reduced to a sequence of atomic values and types. In value comparisons,
the atomized operands must be either singleton atomic values or the empty
(null) sequence. Atomizing the literal "2" results in a single value "2" of
type string. Atomizing an element <e>2</e> without an accompanying schema
results in a value "2" of type xdt:untypedAtomic. If something ends up as
xdt:untypedAtomic, it is treated as a string (value comparisons do things
slightly differently).

After atomization:

o If either operand is null, a null is returned.
o If the cardinality of either operand is > 1, a type error is thrown.
o Otherwise an xs:boolean result is returned, showing the results of the
comparison.

Once the operands have been atomized, the proper comparison function from
the Binary Operator table in the Working Draft [3] needs to be identified
for the two operands. Comparison functions operate on "similar" types; if
the types of the operands are too dissimilar, a type error is thrown. What
do I mean by "similar" and "dissimilar" (my own terminology; not part of the
formal specification)? Similarity means that both operands must be of the
same type to begin with or can be converted to be of the same type through
either type promotion [4] or subtype substitution [5] (see below).

First, here's a counter-example: strings and any form of numeric are
dissimilar. The query:

     1 lt "2"    => Saxon: "ERROR XPTY0004: Cannot compare xs:integer to
xs:string"

throws a type error because strings can't be compared to numerics.

Numeric comparisons
-------------------------
On the other hand,

    1 lt 2.0    => false

compares a numeric (xs:integer) against a numeric (xs:decimal) using the
numeric comparison function, op:numeric-less-than( a, b ).

Numeric comparisons allow the greatest degree of operand dissimilarity,
since there are actually sixteen numeric subtypes in the XML Schema built-in
datatypes hierarchy [6] that can be passed in as arguments to numeric
functions such as op:numeric-less-than() above.

There are actually four sub-varieties of numeric functions per each op:
function: one version to handle floats, one to handle doubles, one to handle
decimals, and one to handle integers. If other datatypes are to be compared,
they need to be converted to one of these four types first. The algorithm
for doing the conversion can be presented as a multiway "if" statement.
Assuming that both operands are numeric:

if either of the two operands is of type float
       convert the other to float and call the appropriate compare-floats()
function
else if either of the operands is of type double
       convert the other to double and call the appropriate
compare-doubles() function
else if either of the operands is of type decimal
       convert the other to decimal and call the appropriate
compare-decimals() function
else
       convert both to integer (if necessary) and call the appropriate
compare-integers() function

In the case of

     1 lt 2.0

for example (xs:integer vs xs:decimal), the compare-decimals() version of
op:numeric-less-than() ends up getting called.

Type Promotion
--------------------
The word "converts" in the algorithm refers to both the mechanisms of type
promotion [4] and subtype substitution [5], depending on what the source and
target numeric types are. If a double is being converted to a float, for
example, type promotion is used. Decimals (or any type derived from decimal)
can also be promoted to either double or float. (I find the term "promotion"
here a bit misleading when talking about doubles and floats, since to me
promotion seems to imply movement or casting "up" a type hierarchy. Floats
and doubles however are at the same level in the XML Schema built-in type
hierarchy [6], and neither is superior or subordinate to the other in terms
of derivation.)

There's a second variety of type promotion in XQuery where any value of type
xs:anyURI can be promoted to string, so that any operator that compares
strings can take an xs:anyURI type of argument.

Subtype Substitution
-------------------------
Subtype substitution results when a subtype is used where its supertype is
required. In the last branch of the above "if" statement, any numeric type
that's subordinate to xs:decimal can be used as an argument to the
appropriate compare-decimals() function, and anything subordinate to
xs:integer can be used as an argument to compare-integers(). For example, in
the comparison

     xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )

both operands are passed into the compare-integers() version of
op:op:numeric-less-than() and compared as integers without casting.
xs:integer is the lowest common type that superordinate to both
nonPositiveInteger and nonNegativeInteger.

Depending on the types of the operands, several conversions might be done
internally to bring both operands to a common (or similar) type. For
example,

    xs:double( 3.14159e0 ) lt xs:short( 4 ) => true

first up-converts the xs:short to xs:decimal using subtype subsitution, and
then type-promotes the xs:decimal to xs:double, so that the
compare-doubles() version of op:numeric-less-than() can be used. The
implementation needn't go through all the intermediate stages if it can be
done more directly (I believe).

That's it for the moment ...

----------------------------------------------------------------------------
---
[1] http://www.w3.org/TR/xquery/#id-value-comparisons
[2] http://www.w3.org/TR/xquery/#id-general-comparisons
[3] http://www.w3.org/TR/xquery/#mapping
[4] http://www.w3.org/TR/xquery/#dt-type-promotion
[5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
[6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes

Received on Tuesday, 19 April 2005 02:57:46 UTC