RE: XQuery value comparisons (Part I - numerics)

 > Howard Katz wrote:
 > >
 > >  > Howard,
 > >  >
 > >  > Thanks for this - it is really helpful, especially about the type
 > >  > promotion and subtype substitution.  I think I can upgrade
 > to that style
 > >  > of execution so I think it is doable.  I'd be interested in
 > ptrs to how
 > >  > expression evaluation maps to existing technologies like SQL.
 > >
 > > Sorry, I don't have any data points in that department.
 > >
 > >  >
 > >  > Do you think this means we need to say what happens to
 > literals of type
 > >  > XMLLiteral?
 > >
 > > Yikes, I'd forgotten all about xml literals! Do you really
 > want to go there?
 >
 > It's only because all the major issues are clear from your email
 > that I thought
 > of such corner cases.  Personally, I have seen little use of XML
 > literals so I
 > don't have a strong sense of what to do or how important they are.

Nice to think I've done such a useful job of it! (I just had a horrible
recollection tho of the fate of the one-eyed man in the land of the blind.
Maybe I shouldn't go there ...? ;-)

Fwiw, one issue I *didn't* get around to explicating that might have some
relevance to the current discussion is the difference in the treatment of
untyped atomics by value and general comparisons. The difference I did
mention is that value comparisons take singletons, while general comparisons
handle sequences. What I didn't discuss is that general comparisons, geared
towards xpaths, are much more lenient in how they treat untyped values. To
wit, in the value comparison,

    <e>123</e> eq "123"

the element content 123 is typed as xdt:untypedAtomic by the atomization
process in the absence of a schema. Items of type untypedAtomic in value
comparisons are converted to xs:string, and the string compare function is
ultimately called. If we tweak the query slightly however and say,

    <e>123</e> eq 123

we throw a type error, since you can't compare a string (resulting from
atomization on the lhs) to an integer (the rhs is an integer literal).

If the above queries are restated as general comparisons however, eg

         <e>123</e> = "123"

the rules change. In this case, the rule is that the untypedAtomic 123 is
converted to the type of _the other operand_. The net effect is the same: we
ultimately invoke a string compare function, with the same result. However,
in the case of the second query, restated as a general comparison:

       <e>123</e> = 123

the untypedAtomic 123 now gets converted to an integer (the type of the
other operand), and we do an integer compare, instead of failing as happened
in the case of the corresponding value comparison.

I'm not sure if there's relevancy in the above, but maybe it'll tweak some
useful thoughts ...

Howard

 >
 > 	Andy
 >
 > > :-) The one thing that springs to mind is that comparisons against xml
 > > literals would be like XQuery comparisons against untyped
 > nodes, where the
 > > contents are extracted via atomization. I'd like to go do a
 > quick review on
 > > atomizing node content if you're really interested ...
 > >
 > > Howard
 > >
 > >  >
 > >  > I prefer the forms =, != etc   rather than eg, ne, gt,..  because we
 > >  > just need one class of comparisions.  And we don't have issues of
 > >  > serializing "<".
 > >  >
 > >  > 	In the 95%*95%
 > >  > 	Andy
 > >  >
 > >  > -------- Original Message --------
 > >  > > From: Howard Katz <>
 > >  > > Date: 19 April 2005 03:58
 > >  > >
 > >  > > I have less time available for this than I'd hoped, so
 > I'm going to
 > >  > > present
 > >  > > an extremely pithy look at what I consider the bare, bare
 > essentials
 > >  > of
 > >  > > XQuery-based comparison semantics and not do a full,
 > >  > feature-by-feature
 > >  > > comparison against what we're doing in sparql, except in a few
 > >  > > instances.
 > >  > > I'm also going to call this Part I and only look at
 > numerics. Part II
 > >  > > (if I
 > >  > > can find the time to do it, and there's interest) will look at
 > >  > strings,
 > >  > > dates, times, and the other remaining XML Schema built-in
 > datatypes.
 > >  > >
 > >  > > I'd be grateful for (gentle!) feedback if anyone finds
 > I've made any
 > >  > > egregious errors in the following. 95% of readers find
 > the following
 > >  > > information correct 95% of the time. :-)
 > >  > >
 > >  > > Howard
 > >  > >
 > >  > > Value vs. General Comparisons
 > >  > > ------------------------------------
 > >  > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] and
 > >  > > general (=, !=, >, <, >=, <=) comparisons [2]. Value
 > comparisons are
 > >  > > used for
 > >  > > comparing singletons; general comparisons have their
 > raison d'etre in
 > >  > > the
 > >  > > requirements of XPath and are for comparing sequences.
 > Since sparql
 > >  > > doesn't
 > >  > > have sequences, and general comparisons generally devolve into
 > >  > > item-by-item
 > >  > > comparisons using value semantics anyway (with a few exceptions I
 > >  > won't
 > >  > > go
 > >  > > into at the moment), I'm only going to look at value
 > comparisons here.
 > >  > > (I
 > >  > > might look at general comparisons in Part II if there
 > seems to be an
 > >  > > interest; I haven't decided yet whether there is or isn't
 > something
 > >  > > useful
 > >  > > to be learned from that topic.)
 > >  > >
 > >  > > I'll note that while sparql appears to be using value
 > comparisons (ie,
 > >  > > singletons only), it uses the operator symbol set from general
 > >  > > comparisons.
 > >  > > I think it's arguable whether this is good, bad, or
 > indifferent; if we
 > >  > > wanted to be precise, we should probably be using the
 > >  > value-comparisons
 > >  > > symbol set , but since those symbols seem to be
 > fortran-based and thus
 > >  > > likely to be viewed as a wondrous cosmological mystery by anybody
 > >  > under
 > >  > > the
 > >  > > age of 50 or so, :-) I see no problem with using the more
 > familiar (=,
 > >  > > !=,
 > >  > > > , <, >=, <=) symbols. In the following where I'm
 > talking about value
 > >  > > comparisons specifically, I'll use the proper value-comparison
 > >  > operators
 > >  > > from XQuery so as to (hopefully!) not further confuse the issue.
 > >  > >
 > >  > > Atomization
 > >  > > --------------
 > >  > > The first step in doing a comparison is atomization, in which each
 > >  > > operand
 > >  > > is reduced to a sequence of atomic values and types. In value
 > >  > > comparisons,
 > >  > > the atomized operands must be either singleton atomic
 > values or the
 > >  > > empty (null) sequence. Atomizing the literal "2" results
 > in a single
 > >  > > value "2" of
 > >  > > type string. Atomizing an element <e>2</e> without an accompanying
 > >  > > schema
 > >  > > results in a value "2" of type xdt:untypedAtomic. If
 > something ends up
 > >  > > as xdt:untypedAtomic, it is treated as a string (value
 > comparisons do
 > >  > > things
 > >  > > slightly differently).
 > >  > >
 > >  > > After atomization:
 > >  > >
 > >  > > o If either operand is null, a null is returned.
 > >  > > o If the cardinality of either operand is > 1, a type
 > error is thrown.
 > >  > > o Otherwise an xs:boolean result is returned, showing the
 > results of
 > >  > the
 > >  > > comparison.
 > >  > >
 > >  > > Once the operands have been atomized, the proper
 > comparison function
 > >  > > from
 > >  > > the Binary Operator table in the Working Draft [3] needs to be
 > >  > > identified
 > >  > > for the two operands. Comparison functions operate on
 > "similar" types;
 > >  > > if
 > >  > > the types of the operands are too dissimilar, a type
 > error is thrown.
 > >  > > What
 > >  > > do I mean by "similar" and "dissimilar" (my own
 > terminology; not part
 > >  > > of the
 > >  > > formal specification)? Similarity means that both
 > operands must be of
 > >  > > the
 > >  > > same type to begin with or can be converted to be of the same type
 > >  > > through
 > >  > > either type promotion [4] or subtype substitution [5] (see below).
 > >  > >
 > >  > > First, here's a counter-example: strings and any form of
 > numeric are
 > >  > > dissimilar. The query:
 > >  > >
 > >  > >      1 lt "2"    => Saxon: "ERROR XPTY0004: Cannot
 > compare xs:integer
 > >  > to
 > >  > > xs:string"
 > >  > >
 > >  > > throws a type error because strings can't be compared to numerics.
 > >  > >
 > >  > > Numeric comparisons
 > >  > > -------------------------
 > >  > > On the other hand,
 > >  > >
 > >  > >     1 lt 2.0    => false
 > >  > >
 > >  > > compares a numeric (xs:integer) against a numeric
 > (xs:decimal) using
 > >  > the
 > >  > > numeric comparison function, op:numeric-less-than( a, b ).
 > >  > >
 > >  > > Numeric comparisons allow the greatest degree of operand
 > >  > dissimilarity,
 > >  > > since there are actually sixteen numeric subtypes in the
 > XML Schema
 > >  > > built-in
 > >  > > datatypes hierarchy [6] that can be passed in as
 > arguments to numeric
 > >  > > functions such as op:numeric-less-than() above.
 > >  > >
 > >  > > There are actually four sub-varieties of numeric
 > functions per each
 > >  > op:
 > >  > > function: one version to handle floats, one to handle
 > doubles, one to
 > >  > > handle
 > >  > > decimals, and one to handle integers. If other datatypes are to be
 > >  > > compared,
 > >  > > they need to be converted to one of these four types first. The
 > >  > > algorithm
 > >  > > for doing the conversion can be presented as a multiway "if"
 > >  > statement.
 > >  > > Assuming that both operands are numeric:
 > >  > >
 > >  > > if either of the two operands is of type float
 > >  > >        convert the other to float and call the appropriate
 > >  > > compare-floats()
 > >  > > function
 > >  > > else if either of the operands is of type double
 > >  > >        convert the other to double and call the appropriate
 > >  > > compare-doubles() function
 > >  > > else if either of the operands is of type decimal
 > >  > >        convert the other to decimal and call the appropriate
 > >  > > compare-decimals() function
 > >  > > else
 > >  > >        convert both to integer (if necessary) and call
 > the appropriate
 > >  > > compare-integers() function
 > >  > >
 > >  > > In the case of
 > >  > >
 > >  > >      1 lt 2.0
 > >  > >
 > >  > > for example (xs:integer vs xs:decimal), the
 > compare-decimals() version
 > >  > > of op:numeric-less-than() ends up getting called.
 > >  > >
 > >  > > Type Promotion
 > >  > > --------------------
 > >  > > The word "converts" in the algorithm refers to both the
 > mechanisms of
 > >  > > type
 > >  > > promotion [4] and subtype substitution [5], depending on what the
 > >  > > source and
 > >  > > target numeric types are. If a double is being converted
 > to a float,
 > >  > for
 > >  > > example, type promotion is used. Decimals (or any type
 > derived from
 > >  > > decimal)
 > >  > > can also be promoted to either double or float. (I find the term
 > >  > > "promotion"
 > >  > > here a bit misleading when talking about doubles and
 > floats, since to
 > >  > me
 > >  > > promotion seems to imply movement or casting "up" a type
 > hierarchy.
 > >  > > Floats
 > >  > > and doubles however are at the same level in the XML
 > Schema built-in
 > >  > > type
 > >  > > hierarchy [6], and neither is superior or subordinate to
 > the other in
 > >  > > terms
 > >  > > of derivation.)
 > >  > >
 > >  > > There's a second variety of type promotion in XQuery
 > where any value
 > >  > of
 > >  > > type
 > >  > > xs:anyURI can be promoted to string, so that any operator that
 > >  > compares
 > >  > > strings can take an xs:anyURI type of argument.
 > >  > >
 > >  > > Subtype Substitution
 > >  > > -------------------------
 > >  > > Subtype substitution results when a subtype is used where its
 > >  > supertype
 > >  > > is
 > >  > > required. In the last branch of the above "if" statement,
 > any numeric
 > >  > > type
 > >  > > that's subordinate to xs:decimal can be used as an argument to the
 > >  > > appropriate compare-decimals() function, and anything
 > subordinate to
 > >  > > xs:integer can be used as an argument to compare-integers(). For
 > >  > > example, in
 > >  > > the comparison
 > >  > >
 > >  > >      xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )
 > >  > >
 > >  > > both operands are passed into the compare-integers() version of
 > >  > > op:op:numeric-less-than() and compared as integers
 > without casting.
 > >  > > xs:integer is the lowest common type that superordinate to both
 > >  > > nonPositiveInteger and nonNegativeInteger.
 > >  > >
 > >  > > Depending on the types of the operands, several
 > conversions might be
 > >  > > done
 > >  > > internally to bring both operands to a common (or
 > similar) type. For
 > >  > > example,
 > >  > >
 > >  > >     xs:double( 3.14159e0 ) lt xs:short( 4 ) => true
 > >  > >
 > >  > > first up-converts the xs:short to xs:decimal using subtype
 > >  > subsitution,
 > >  > > and
 > >  > > then type-promotes the xs:decimal to xs:double, so that the
 > >  > > compare-doubles() version of op:numeric-less-than() can
 > be used. The
 > >  > > implementation needn't go through all the intermediate
 > stages if it
 > >  > can
 > >  > > be
 > >  > > done more directly (I believe).
 > >  > >
 > >  > > That's it for the moment ...
 > >  > >
 > >  > >
 > >  >
 > ------------------------------------------------------------------------
 > >  > ----
 > >  > > ---
 > >  > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons
 > >  > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons
 > >  > > [3] http://www.w3.org/TR/xquery/#mapping
 > >  > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion
 > >  > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
 > >  > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
 > >  >
 > >
 > >
 > >
 >
 >

Received on Tuesday, 19 April 2005 17:25:22 UTC