RE: XQuery value comparisons (Part I - numerics)

 > -----Original Message-----
 > From: Eric Prud'hommeaux [mailto:eric@w3.org]
 > Sent: Wednesday, April 20, 2005 5:01 PM
 > To: Howard Katz
 > Cc: RDF Data Access Working Group
 > Subject: Re: XQuery value comparisons (Part I - numerics)
 >
 >
 > On Mon, Apr 18, 2005 at 07:57:40PM -0700, Howard Katz wrote:
 > >
 > > I have less time available for this than I'd hoped, so I'm
 > going to present
 > > an extremely pithy look at what I consider the bare, bare essentials of
 > > XQuery-based comparison semantics and not do a full, feature-by-feature
 > > comparison against what we're doing in sparql, except in a few
 > instances.
 > > I'm also going to call this Part I and only look at numerics.
 > Part II (if I
 > > can find the time to do it, and there's interest) will look at strings,
 > > dates, times, and the other remaining XML Schema built-in datatypes.
 > >
 > > I'd be grateful for (gentle!) feedback if anyone finds I've made any
 > > egregious errors in the following. 95% of readers find the following
 > > information correct 95% of the time. :-)
 > >
 > > Howard
 > >
 > > Value vs. General Comparisons
 > > ------------------------------------
 > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1]
 > and general
 > > (=, !=, >, <, >=, <=) comparisons [2]. Value comparisons are used for
 > > comparing singletons; general comparisons have their raison
 > d'etre in the
 > > requirements of XPath and are for comparing sequences.
 >
 > Is that their only use? I've unwittingly focused on value comparisons.
 > I'd like to hear that general comparisons apply to us. If they are
 > only needed for sequence comparison, then I am happy to carry on with
 > my myopic examination of value comparison.

Their other use, as already described elsewhere, is to provide a greater
possibility of a successful comparison. They're less rigorous in what they
reject. I believe this originated in XPath 1, tho I'm not sure of the exact
rationale. It's kind of like designing browsers not to fail on crufty (sp?)
html.

 >
 > >                                                        Since
 > sparql doesn't
 > > have sequences,
 >
 > http://unagi/2001/sw/DataAccess/rq23/#StandardOperations
 > [[
 > Unlike XPath/XQuery, SPARQL functions do not process node
 > sequences. When interpreting the semantics of XPath functions, assume
 > that each argument is a sequence of a single node.
 > ]]
 >
 > >                 and general comparisons generally devolve into
 > item-by-item
 > > comparisons using value semantics anyway (with a few
 > exceptions I won't go
 > > into at the moment), I'm only going to look at value
 > comparisons here. (I
 > > might look at general comparisons in Part II if there seems to be an
 > > interest; I haven't decided yet whether there is or isn't
 > something useful
 > > to be learned from that topic.)
 > >
 > > I'll note that while sparql appears to be using value comparisons (ie,
 > > singletons only), it uses the operator symbol set from general
 > comparisons.
 > > I think it's arguable whether this is good, bad, or indifferent; if we
 > > wanted to be precise, we should probably be using the value-comparisons
 > > symbol set , but since those symbols seem to be fortran-based and thus
 > > likely to be viewed as a wondrous cosmological mystery by
 > anybody under the
 > > age of 50 or so, :-) I see no problem with using the more
 > familiar (=, !=,
 >
 > ha!
 >
 > > >, <, >=, <=) symbols. In the following where I'm talking about value
 > > comparisons specifically, I'll use the proper value-comparison
 > operators
 > > from XQuery so as to (hopefully!) not further confuse the issue.
 > >
 > > Atomization
 > > --------------
 > > The first step in doing a comparison is atomization, in which
 > each operand
 > > is reduced to a sequence of atomic values and types. In value
 > comparisons,
 > > the atomized operands must be either singleton atomic values
 > or the empty
 > > (null) sequence. Atomizing the literal "2" results in a single
 > value "2" of
 > > type string. Atomizing an element <e>2</e> without an
 > accompanying schema
 > > results in a value "2" of type xdt:untypedAtomic. If something
 > ends up as
 > > xdt:untypedAtomic, it is treated as a string (value
 > comparisons do things
 > > slightly differently).
 > >
 > > After atomization:
 > >
 > > o If either operand is null, a null is returned.
 > > o If the cardinality of either operand is > 1, a type error is thrown.
 > > o Otherwise an xs:boolean result is returned, showing the
 > results of the
 > > comparison.
 > >
 > > Once the operands have been atomized, the proper comparison
 > function from
 > > the Binary Operator table in the Working Draft [3] needs to be
 > identified
 > > for the two operands. Comparison functions operate on
 > "similar" types; if
 > > the types of the operands are too dissimilar, a type error is
 > thrown. What
 > > do I mean by "similar" and "dissimilar" (my own terminology;
 > not part of the
 > > formal specification)? Similarity means that both operands
 > must be of the
 > > same type to begin with or can be converted to be of the same
 > type through
 > > either type promotion [4] or subtype substitution [5] (see below).
 > >
 > > First, here's a counter-example: strings and any form of numeric are
 > > dissimilar. The query:
 > >
 > >      1 lt "2"    => Saxon: "ERROR XPTY0004: Cannot compare
 > xs:integer to
 > > xs:string"
 > >
 > > throws a type error because strings can't be compared to numerics.
 > >
 > > Numeric comparisons
 > > -------------------------
 > > On the other hand,
 > >
 > >     1 lt 2.0    => false
 >
 > false? really? NTP didn't move the args into a domain where 1 < 2 ?
 >
 > > compares a numeric (xs:integer) against a numeric (xs:decimal)
 > using the
 > > numeric comparison function, op:numeric-less-than( a, b ).
 > >
 > > Numeric comparisons allow the greatest degree of operand dissimilarity,
 > > since there are actually sixteen numeric subtypes in the XML
 > Schema built-in
 > > datatypes hierarchy [6] that can be passed in as arguments to numeric
 > > functions such as op:numeric-less-than() above.
 > >
 > > There are actually four sub-varieties of numeric functions per each op:
 > > function: one version to handle floats, one to handle doubles,
 > one to handle
 > > decimals, and one to handle integers. If other datatypes are
 > to be compared,
 > > they need to be converted to one of these four types first.
 > The algorithm
 > > for doing the conversion can be presented as a multiway "if" statement.
 > > Assuming that both operands are numeric:
 > >
 > > if either of the two operands is of type float
 > >        convert the other to float and call the appropriate
 > compare-floats()
 > > function
 > > else if either of the operands is of type double
 > >        convert the other to double and call the appropriate
 > > compare-doubles() function
 > > else if either of the operands is of type decimal
 > >        convert the other to decimal and call the appropriate
 > > compare-decimals() function
 > > else
 > >        convert both to integer (if necessary) and call the appropriate
 > > compare-integers() function
 > >
 > > In the case of
 > >
 > >      1 lt 2.0
 > >
 > > for example (xs:integer vs xs:decimal), the compare-decimals()
 > version of
 > > op:numeric-less-than() ends up getting called.
 > >
 > > Type Promotion
 > > --------------------
 > > The word "converts" in the algorithm refers to both the
 > mechanisms of type
 > > promotion [4] and subtype substitution [5], depending on what
 > the source and
 > > target numeric types are. If a double is being converted to a
 > float, for
 > > example, type promotion is used. Decimals (or any type derived
 > from decimal)
 > > can also be promoted to either double or float. (I find the
 > term "promotion"
 > > here a bit misleading when talking about doubles and floats,
 > since to me
 > > promotion seems to imply movement or casting "up" a type
 > hierarchy. Floats
 > > and doubles however are at the same level in the XML Schema
 > built-in type
 > > hierarchy [6], and neither is superior or subordinate to the
 > other in terms
 > > of derivation.)
 >
 > I assumed the justification had to do with binary representations and
 > that decimals (whatever they are) could be respresented as (fit in)
 > floats and floats could fit in doubles. This rationale suggests that
 > the least constrained subtype of decimal fits in a float, which I
 > don't know to be true.

I believe what you're saying is correct. I don't believe I'm saying that's
not the case. (If you believe me. :-)

 >
 > > There's a second variety of type promotion in XQuery where any
 > value of type
 > > xs:anyURI can be promoted to string, so that any operator that compares
 > > strings can take an xs:anyURI type of argument.
 >
 > I had something analogous:
 > [[
 > For functions and operators where the expected type is specified as
 > numeric, untyped literals are cast to xs:double.
 > ]]
 > but commented it out. In fact, I think it my implementation
 > automatically casts them to string when needed. Thus, the above line
 > should go back in, but s/xs:double/xs:string/ .
 >
 > > Subtype Substitution
 > > -------------------------
 > > Subtype substitution results when a subtype is used where its
 > supertype is
 > > required. In the last branch of the above "if" statement, any
 > numeric type
 > > that's subordinate to xs:decimal can be used as an argument to the
 > > appropriate compare-decimals() function, and anything subordinate to
 > > xs:integer can be used as an argument to compare-integers().
 > For example, in
 > > the comparison
 > >
 > >      xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )
 > >
 > > both operands are passed into the compare-integers() version of
 > > op:op:numeric-less-than() and compared as integers without casting.
 > > xs:integer is the lowest common type that superordinate to both
 > > nonPositiveInteger and nonNegativeInteger.
 >
 > A difference I recall about NTP vs. subtype promotion is that NTP
 > actually changes the type of the argument rather than harmlessly
 > upcasting it to pass it as it to a function. The related spec text*:

NTP? I don't know if it's relevant, but much is made in the XQuery text
about subtype substitution *not* actually changing the type of the converted
operand. I've never been sure exactly why that fact was being emphasized.

 > [[
 > XML Schema [] defines a set of types derived from decimal: integer;
 > nonPositiveInteger; negativeInteger; long; int; short; byte;
 > nonNegativeInteger; unsignedLong; unsignedInt; unsignedShort;
 > unsignedByte and positiveInteger. These are all treated as decimals
 > for arithmetic operations. SPARQL does not specifically require
 > integrity checks on derived subtypes. SPARQL has no numeric type test
 > operators so the distinction between a primitive type and a type
 > derived from that primitive type is unobservable.
 > ]]
 > * just fixed 30s ago.
 >
 > > Depending on the types of the operands, several conversions
 > might be done
 > > internally to bring both operands to a common (or similar) type. For
 > > example,
 > >
 > >     xs:double( 3.14159e0 ) lt xs:short( 4 ) => true
 > >
 > > first up-converts the xs:short to xs:decimal using subtype
 > subsitution, and
 > > then type-promotes the xs:decimal to xs:double, so that the
 > > compare-doubles() version of op:numeric-less-than() can be used. The
 > > implementation needn't go through all the intermediate stages
 > if it can be
 > > done more directly (I believe).
 > >
 > > That's it for the moment ...
 > >
 > >
 > -----------------------------------------------------------------
 > -----------
 > > ---
 > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons
 > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons
 > > [3] http://www.w3.org/TR/xquery/#mapping
 > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion
 > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
 > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
 > >
 > >
 >
 > --
 > -eric
 >
 > office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
 >                         Shonan Fujisawa Campus, Keio University,
 >                         5322 Endo, Fujisawa, Kanagawa 252-8520
 >                         JAPAN
 >         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
 > cell:   +81.90.6533.3882
 >
 > (eric@w3.org)
 > Feel free to forward this message to any list for any purpose other than
 > email address distribution.
 >

Received on Thursday, 21 April 2005 03:05:32 UTC