Re: XQuery value comparisons (Part I - numerics)

On Mon, Apr 18, 2005 at 07:57:40PM -0700, Howard Katz wrote:
> 
> I have less time available for this than I'd hoped, so I'm going to present
> an extremely pithy look at what I consider the bare, bare essentials of
> XQuery-based comparison semantics and not do a full, feature-by-feature
> comparison against what we're doing in sparql, except in a few instances.
> I'm also going to call this Part I and only look at numerics. Part II (if I
> can find the time to do it, and there's interest) will look at strings,
> dates, times, and the other remaining XML Schema built-in datatypes.
> 
> I'd be grateful for (gentle!) feedback if anyone finds I've made any
> egregious errors in the following. 95% of readers find the following
> information correct 95% of the time. :-)
> 
> Howard
> 
> Value vs. General Comparisons
> ------------------------------------
> XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] and general
> (=, !=, >, <, >=, <=) comparisons [2]. Value comparisons are used for
> comparing singletons; general comparisons have their raison d'etre in the
> requirements of XPath and are for comparing sequences.

Is that their only use? I've unwittingly focused on value comparisons.
I'd like to hear that general comparisons apply to us. If they are
only needed for sequence comparison, then I am happy to carry on with
my myopic examination of value comparison.

>                                                        Since sparql doesn't
> have sequences,

http://unagi/2001/sw/DataAccess/rq23/#StandardOperations
[[
Unlike XPath/XQuery, SPARQL functions do not process node
sequences. When interpreting the semantics of XPath functions, assume
that each argument is a sequence of a single node.
]]

>                 and general comparisons generally devolve into item-by-item
> comparisons using value semantics anyway (with a few exceptions I won't go
> into at the moment), I'm only going to look at value comparisons here. (I
> might look at general comparisons in Part II if there seems to be an
> interest; I haven't decided yet whether there is or isn't something useful
> to be learned from that topic.)
> 
> I'll note that while sparql appears to be using value comparisons (ie,
> singletons only), it uses the operator symbol set from general comparisons.
> I think it's arguable whether this is good, bad, or indifferent; if we
> wanted to be precise, we should probably be using the value-comparisons
> symbol set , but since those symbols seem to be fortran-based and thus
> likely to be viewed as a wondrous cosmological mystery by anybody under the
> age of 50 or so, :-) I see no problem with using the more familiar (=, !=,

ha!

> >, <, >=, <=) symbols. In the following where I'm talking about value
> comparisons specifically, I'll use the proper value-comparison operators
> from XQuery so as to (hopefully!) not further confuse the issue.
> 
> Atomization
> --------------
> The first step in doing a comparison is atomization, in which each operand
> is reduced to a sequence of atomic values and types. In value comparisons,
> the atomized operands must be either singleton atomic values or the empty
> (null) sequence. Atomizing the literal "2" results in a single value "2" of
> type string. Atomizing an element <e>2</e> without an accompanying schema
> results in a value "2" of type xdt:untypedAtomic. If something ends up as
> xdt:untypedAtomic, it is treated as a string (value comparisons do things
> slightly differently).
> 
> After atomization:
> 
> o If either operand is null, a null is returned.
> o If the cardinality of either operand is > 1, a type error is thrown.
> o Otherwise an xs:boolean result is returned, showing the results of the
> comparison.
> 
> Once the operands have been atomized, the proper comparison function from
> the Binary Operator table in the Working Draft [3] needs to be identified
> for the two operands. Comparison functions operate on "similar" types; if
> the types of the operands are too dissimilar, a type error is thrown. What
> do I mean by "similar" and "dissimilar" (my own terminology; not part of the
> formal specification)? Similarity means that both operands must be of the
> same type to begin with or can be converted to be of the same type through
> either type promotion [4] or subtype substitution [5] (see below).
> 
> First, here's a counter-example: strings and any form of numeric are
> dissimilar. The query:
> 
>      1 lt "2"    => Saxon: "ERROR XPTY0004: Cannot compare xs:integer to
> xs:string"
> 
> throws a type error because strings can't be compared to numerics.
> 
> Numeric comparisons
> -------------------------
> On the other hand,
> 
>     1 lt 2.0    => false

false? really? NTP didn't move the args into a domain where 1 < 2 ?

> compares a numeric (xs:integer) against a numeric (xs:decimal) using the
> numeric comparison function, op:numeric-less-than( a, b ).
> 
> Numeric comparisons allow the greatest degree of operand dissimilarity,
> since there are actually sixteen numeric subtypes in the XML Schema built-in
> datatypes hierarchy [6] that can be passed in as arguments to numeric
> functions such as op:numeric-less-than() above.
> 
> There are actually four sub-varieties of numeric functions per each op:
> function: one version to handle floats, one to handle doubles, one to handle
> decimals, and one to handle integers. If other datatypes are to be compared,
> they need to be converted to one of these four types first. The algorithm
> for doing the conversion can be presented as a multiway "if" statement.
> Assuming that both operands are numeric:
> 
> if either of the two operands is of type float
>        convert the other to float and call the appropriate compare-floats()
> function
> else if either of the operands is of type double
>        convert the other to double and call the appropriate
> compare-doubles() function
> else if either of the operands is of type decimal
>        convert the other to decimal and call the appropriate
> compare-decimals() function
> else
>        convert both to integer (if necessary) and call the appropriate
> compare-integers() function
> 
> In the case of
> 
>      1 lt 2.0
> 
> for example (xs:integer vs xs:decimal), the compare-decimals() version of
> op:numeric-less-than() ends up getting called.
> 
> Type Promotion
> --------------------
> The word "converts" in the algorithm refers to both the mechanisms of type
> promotion [4] and subtype substitution [5], depending on what the source and
> target numeric types are. If a double is being converted to a float, for
> example, type promotion is used. Decimals (or any type derived from decimal)
> can also be promoted to either double or float. (I find the term "promotion"
> here a bit misleading when talking about doubles and floats, since to me
> promotion seems to imply movement or casting "up" a type hierarchy. Floats
> and doubles however are at the same level in the XML Schema built-in type
> hierarchy [6], and neither is superior or subordinate to the other in terms
> of derivation.)

I assumed the justification had to do with binary representations and
that decimals (whatever they are) could be respresented as (fit in)
floats and floats could fit in doubles. This rationale suggests that
the least constrained subtype of decimal fits in a float, which I
don't know to be true.

> There's a second variety of type promotion in XQuery where any value of type
> xs:anyURI can be promoted to string, so that any operator that compares
> strings can take an xs:anyURI type of argument.

I had something analogous:
[[
For functions and operators where the expected type is specified as
numeric, untyped literals are cast to xs:double.
]]
but commented it out. In fact, I think it my implementation
automatically casts them to string when needed. Thus, the above line
should go back in, but s/xs:double/xs:string/ .

> Subtype Substitution
> -------------------------
> Subtype substitution results when a subtype is used where its supertype is
> required. In the last branch of the above "if" statement, any numeric type
> that's subordinate to xs:decimal can be used as an argument to the
> appropriate compare-decimals() function, and anything subordinate to
> xs:integer can be used as an argument to compare-integers(). For example, in
> the comparison
> 
>      xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )
> 
> both operands are passed into the compare-integers() version of
> op:op:numeric-less-than() and compared as integers without casting.
> xs:integer is the lowest common type that superordinate to both
> nonPositiveInteger and nonNegativeInteger.

A difference I recall about NTP vs. subtype promotion is that NTP
actually changes the type of the argument rather than harmlessly
upcasting it to pass it as it to a function. The related spec text*:

[[
XML Schema [] defines a set of types derived from decimal: integer;
nonPositiveInteger; negativeInteger; long; int; short; byte;
nonNegativeInteger; unsignedLong; unsignedInt; unsignedShort;
unsignedByte and positiveInteger. These are all treated as decimals
for arithmetic operations. SPARQL does not specifically require
integrity checks on derived subtypes. SPARQL has no numeric type test
operators so the distinction between a primitive type and a type
derived from that primitive type is unobservable.
]]
* just fixed 30s ago.

> Depending on the types of the operands, several conversions might be done
> internally to bring both operands to a common (or similar) type. For
> example,
> 
>     xs:double( 3.14159e0 ) lt xs:short( 4 ) => true
> 
> first up-converts the xs:short to xs:decimal using subtype subsitution, and
> then type-promotes the xs:decimal to xs:double, so that the
> compare-doubles() version of op:numeric-less-than() can be used. The
> implementation needn't go through all the intermediate stages if it can be
> done more directly (I believe).
> 
> That's it for the moment ...
> 
> ----------------------------------------------------------------------------
> ---
> [1] http://www.w3.org/TR/xquery/#id-value-comparisons
> [2] http://www.w3.org/TR/xquery/#id-general-comparisons
> [3] http://www.w3.org/TR/xquery/#mapping
> [4] http://www.w3.org/TR/xquery/#dt-type-promotion
> [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
> [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
> 
> 

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Thursday, 21 April 2005 00:01:00 UTC