Re: XQuery value comparisons (Part I - numerics)

On Tue, Apr 19, 2005 at 10:24:55AM -0700, Howard Katz wrote:
> 
>  > Howard Katz wrote:
>  > >
>  > >  > Howard,
>  > >  >
>  > >  > Thanks for this - it is really helpful, especially about the type
>  > >  > promotion and subtype substitution.  I think I can upgrade
>  > to that style
>  > >  > of execution so I think it is doable.  I'd be interested in
>  > ptrs to how
>  > >  > expression evaluation maps to existing technologies like SQL.
>  > >
>  > > Sorry, I don't have any data points in that department.
>  > >
>  > >  >
>  > >  > Do you think this means we need to say what happens to
>  > literals of type
>  > >  > XMLLiteral?
>  > >
>  > > Yikes, I'd forgotten all about xml literals! Do you really
>  > want to go there?
>  >
>  > It's only because all the major issues are clear from your email
>  > that I thought
>  > of such corner cases.  Personally, I have seen little use of XML
>  > literals so I
>  > don't have a strong sense of what to do or how important they are.
> 
> Nice to think I've done such a useful job of it! (I just had a horrible
> recollection tho of the fate of the one-eyed man in the land of the blind.
> Maybe I shouldn't go there ...? ;-)
> 
> Fwiw, one issue I *didn't* get around to explicating that might have some
> relevance to the current discussion is the difference in the treatment of
> untyped atomics by value and general comparisons. The difference I did
> mention is that value comparisons take singletons, while general comparisons
> handle sequences. What I didn't discuss is that general comparisons, geared
> towards xpaths, are much more lenient in how they treat untyped values. To
> wit, in the value comparison,
> 
>     <e>123</e> eq "123"
> 
> the element content 123 is typed as xdt:untypedAtomic by the atomization
> process in the absence of a schema. Items of type untypedAtomic in value
> comparisons are converted to xs:string, and the string compare
> function is

As you observed earlier, we seem to be using value comparison with
general comparison operator syntax. This automatic promotion to string
is consistent with that.

> ultimately called. If we tweak the query slightly however and say,
> 
>     <e>123</e> eq 123
> 
> we throw a type error, since you can't compare a string (resulting from
> atomization on the lhs) to an integer (the rhs is an integer literal).
> 
> If the above queries are restated as general comparisons however, eg
> 
>          <e>123</e> = "123"
> 
> the rules change. In this case, the rule is that the untypedAtomic 123 is
> converted to the type of _the other operand_. The net effect is the same: we
> ultimately invoke a string compare function, with the same result. However,
> in the case of the second query, restated as a general comparison:
> 
>        <e>123</e> = 123
> 
> the untypedAtomic 123 now gets converted to an integer (the type of the
> other operand), and we do an integer compare, instead of failing as happened
> in the case of the corresponding value comparison.

This is what scared me. How does it know what the other type is? For
instance, does
        <e>123</e> = <e>123.0</e>
work? Neither has a type. Lexically they differ. How about
        <e>123</e> = 123.0

> I'm not sure if there's relevancy in the above, but maybe it'll tweak some
> useful thoughts ...
> 
> Howard
> 
>  >
>  > 	Andy
>  >
>  > > :-) The one thing that springs to mind is that comparisons against xml
>  > > literals would be like XQuery comparisons against untyped
>  > nodes, where the
>  > > contents are extracted via atomization. I'd like to go do a
>  > quick review on
>  > > atomizing node content if you're really interested ...
>  > >
>  > > Howard
>  > >
>  > >  >
>  > >  > I prefer the forms =, != etc   rather than eg, ne, gt,..  because we
>  > >  > just need one class of comparisions.  And we don't have issues of
>  > >  > serializing "<".
>  > >  >
>  > >  > 	In the 95%*95%
>  > >  > 	Andy
>  > >  >
>  > >  > -------- Original Message --------
>  > >  > > From: Howard Katz <>
>  > >  > > Date: 19 April 2005 03:58
>  > >  > >
>  > >  > > I have less time available for this than I'd hoped, so
>  > I'm going to
>  > >  > > present
>  > >  > > an extremely pithy look at what I consider the bare, bare
>  > essentials
>  > >  > of
>  > >  > > XQuery-based comparison semantics and not do a full,
>  > >  > feature-by-feature
>  > >  > > comparison against what we're doing in sparql, except in a few
>  > >  > > instances.
>  > >  > > I'm also going to call this Part I and only look at
>  > numerics. Part II
>  > >  > > (if I
>  > >  > > can find the time to do it, and there's interest) will look at
>  > >  > strings,
>  > >  > > dates, times, and the other remaining XML Schema built-in
>  > datatypes.
>  > >  > >
>  > >  > > I'd be grateful for (gentle!) feedback if anyone finds
>  > I've made any
>  > >  > > egregious errors in the following. 95% of readers find
>  > the following
>  > >  > > information correct 95% of the time. :-)
>  > >  > >
>  > >  > > Howard
>  > >  > >
>  > >  > > Value vs. General Comparisons
>  > >  > > ------------------------------------
>  > >  > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] and
>  > >  > > general (=, !=, >, <, >=, <=) comparisons [2]. Value
>  > comparisons are
>  > >  > > used for
>  > >  > > comparing singletons; general comparisons have their
>  > raison d'etre in
>  > >  > > the
>  > >  > > requirements of XPath and are for comparing sequences.
>  > Since sparql
>  > >  > > doesn't
>  > >  > > have sequences, and general comparisons generally devolve into
>  > >  > > item-by-item
>  > >  > > comparisons using value semantics anyway (with a few exceptions I
>  > >  > won't
>  > >  > > go
>  > >  > > into at the moment), I'm only going to look at value
>  > comparisons here.
>  > >  > > (I
>  > >  > > might look at general comparisons in Part II if there
>  > seems to be an
>  > >  > > interest; I haven't decided yet whether there is or isn't
>  > something
>  > >  > > useful
>  > >  > > to be learned from that topic.)
>  > >  > >
>  > >  > > I'll note that while sparql appears to be using value
>  > comparisons (ie,
>  > >  > > singletons only), it uses the operator symbol set from general
>  > >  > > comparisons.
>  > >  > > I think it's arguable whether this is good, bad, or
>  > indifferent; if we
>  > >  > > wanted to be precise, we should probably be using the
>  > >  > value-comparisons
>  > >  > > symbol set , but since those symbols seem to be
>  > fortran-based and thus
>  > >  > > likely to be viewed as a wondrous cosmological mystery by anybody
>  > >  > under
>  > >  > > the
>  > >  > > age of 50 or so, :-) I see no problem with using the more
>  > familiar (=,
>  > >  > > !=,
>  > >  > > > , <, >=, <=) symbols. In the following where I'm
>  > talking about value
>  > >  > > comparisons specifically, I'll use the proper value-comparison
>  > >  > operators
>  > >  > > from XQuery so as to (hopefully!) not further confuse the issue.
>  > >  > >
>  > >  > > Atomization
>  > >  > > --------------
>  > >  > > The first step in doing a comparison is atomization, in which each
>  > >  > > operand
>  > >  > > is reduced to a sequence of atomic values and types. In value
>  > >  > > comparisons,
>  > >  > > the atomized operands must be either singleton atomic
>  > values or the
>  > >  > > empty (null) sequence. Atomizing the literal "2" results
>  > in a single
>  > >  > > value "2" of
>  > >  > > type string. Atomizing an element <e>2</e> without an accompanying
>  > >  > > schema
>  > >  > > results in a value "2" of type xdt:untypedAtomic. If
>  > something ends up
>  > >  > > as xdt:untypedAtomic, it is treated as a string (value
>  > comparisons do
>  > >  > > things
>  > >  > > slightly differently).
>  > >  > >
>  > >  > > After atomization:
>  > >  > >
>  > >  > > o If either operand is null, a null is returned.
>  > >  > > o If the cardinality of either operand is > 1, a type
>  > error is thrown.
>  > >  > > o Otherwise an xs:boolean result is returned, showing the
>  > results of
>  > >  > the
>  > >  > > comparison.
>  > >  > >
>  > >  > > Once the operands have been atomized, the proper
>  > comparison function
>  > >  > > from
>  > >  > > the Binary Operator table in the Working Draft [3] needs to be
>  > >  > > identified
>  > >  > > for the two operands. Comparison functions operate on
>  > "similar" types;
>  > >  > > if
>  > >  > > the types of the operands are too dissimilar, a type
>  > error is thrown.
>  > >  > > What
>  > >  > > do I mean by "similar" and "dissimilar" (my own
>  > terminology; not part
>  > >  > > of the
>  > >  > > formal specification)? Similarity means that both
>  > operands must be of
>  > >  > > the
>  > >  > > same type to begin with or can be converted to be of the same type
>  > >  > > through
>  > >  > > either type promotion [4] or subtype substitution [5] (see below).
>  > >  > >
>  > >  > > First, here's a counter-example: strings and any form of
>  > numeric are
>  > >  > > dissimilar. The query:
>  > >  > >
>  > >  > >      1 lt "2"    => Saxon: "ERROR XPTY0004: Cannot
>  > compare xs:integer
>  > >  > to
>  > >  > > xs:string"
>  > >  > >
>  > >  > > throws a type error because strings can't be compared to numerics.
>  > >  > >
>  > >  > > Numeric comparisons
>  > >  > > -------------------------
>  > >  > > On the other hand,
>  > >  > >
>  > >  > >     1 lt 2.0    => false
>  > >  > >
>  > >  > > compares a numeric (xs:integer) against a numeric
>  > (xs:decimal) using
>  > >  > the
>  > >  > > numeric comparison function, op:numeric-less-than( a, b ).
>  > >  > >
>  > >  > > Numeric comparisons allow the greatest degree of operand
>  > >  > dissimilarity,
>  > >  > > since there are actually sixteen numeric subtypes in the
>  > XML Schema
>  > >  > > built-in
>  > >  > > datatypes hierarchy [6] that can be passed in as
>  > arguments to numeric
>  > >  > > functions such as op:numeric-less-than() above.
>  > >  > >
>  > >  > > There are actually four sub-varieties of numeric
>  > functions per each
>  > >  > op:
>  > >  > > function: one version to handle floats, one to handle
>  > doubles, one to
>  > >  > > handle
>  > >  > > decimals, and one to handle integers. If other datatypes are to be
>  > >  > > compared,
>  > >  > > they need to be converted to one of these four types first. The
>  > >  > > algorithm
>  > >  > > for doing the conversion can be presented as a multiway "if"
>  > >  > statement.
>  > >  > > Assuming that both operands are numeric:
>  > >  > >
>  > >  > > if either of the two operands is of type float
>  > >  > >        convert the other to float and call the appropriate
>  > >  > > compare-floats()
>  > >  > > function
>  > >  > > else if either of the operands is of type double
>  > >  > >        convert the other to double and call the appropriate
>  > >  > > compare-doubles() function
>  > >  > > else if either of the operands is of type decimal
>  > >  > >        convert the other to decimal and call the appropriate
>  > >  > > compare-decimals() function
>  > >  > > else
>  > >  > >        convert both to integer (if necessary) and call
>  > the appropriate
>  > >  > > compare-integers() function
>  > >  > >
>  > >  > > In the case of
>  > >  > >
>  > >  > >      1 lt 2.0
>  > >  > >
>  > >  > > for example (xs:integer vs xs:decimal), the
>  > compare-decimals() version
>  > >  > > of op:numeric-less-than() ends up getting called.
>  > >  > >
>  > >  > > Type Promotion
>  > >  > > --------------------
>  > >  > > The word "converts" in the algorithm refers to both the
>  > mechanisms of
>  > >  > > type
>  > >  > > promotion [4] and subtype substitution [5], depending on what the
>  > >  > > source and
>  > >  > > target numeric types are. If a double is being converted
>  > to a float,
>  > >  > for
>  > >  > > example, type promotion is used. Decimals (or any type
>  > derived from
>  > >  > > decimal)
>  > >  > > can also be promoted to either double or float. (I find the term
>  > >  > > "promotion"
>  > >  > > here a bit misleading when talking about doubles and
>  > floats, since to
>  > >  > me
>  > >  > > promotion seems to imply movement or casting "up" a type
>  > hierarchy.
>  > >  > > Floats
>  > >  > > and doubles however are at the same level in the XML
>  > Schema built-in
>  > >  > > type
>  > >  > > hierarchy [6], and neither is superior or subordinate to
>  > the other in
>  > >  > > terms
>  > >  > > of derivation.)
>  > >  > >
>  > >  > > There's a second variety of type promotion in XQuery
>  > where any value
>  > >  > of
>  > >  > > type
>  > >  > > xs:anyURI can be promoted to string, so that any operator that
>  > >  > compares
>  > >  > > strings can take an xs:anyURI type of argument.
>  > >  > >
>  > >  > > Subtype Substitution
>  > >  > > -------------------------
>  > >  > > Subtype substitution results when a subtype is used where its
>  > >  > supertype
>  > >  > > is
>  > >  > > required. In the last branch of the above "if" statement,
>  > any numeric
>  > >  > > type
>  > >  > > that's subordinate to xs:decimal can be used as an argument to the
>  > >  > > appropriate compare-decimals() function, and anything
>  > subordinate to
>  > >  > > xs:integer can be used as an argument to compare-integers(). For
>  > >  > > example, in
>  > >  > > the comparison
>  > >  > >
>  > >  > >      xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )
>  > >  > >
>  > >  > > both operands are passed into the compare-integers() version of
>  > >  > > op:op:numeric-less-than() and compared as integers
>  > without casting.
>  > >  > > xs:integer is the lowest common type that superordinate to both
>  > >  > > nonPositiveInteger and nonNegativeInteger.
>  > >  > >
>  > >  > > Depending on the types of the operands, several
>  > conversions might be
>  > >  > > done
>  > >  > > internally to bring both operands to a common (or
>  > similar) type. For
>  > >  > > example,
>  > >  > >
>  > >  > >     xs:double( 3.14159e0 ) lt xs:short( 4 ) => true
>  > >  > >
>  > >  > > first up-converts the xs:short to xs:decimal using subtype
>  > >  > subsitution,
>  > >  > > and
>  > >  > > then type-promotes the xs:decimal to xs:double, so that the
>  > >  > > compare-doubles() version of op:numeric-less-than() can
>  > be used. The
>  > >  > > implementation needn't go through all the intermediate
>  > stages if it
>  > >  > can
>  > >  > > be
>  > >  > > done more directly (I believe).
>  > >  > >
>  > >  > > That's it for the moment ...
>  > >  > >
>  > >  > >
>  > >  >
>  > ------------------------------------------------------------------------
>  > >  > ----
>  > >  > > ---
>  > >  > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons
>  > >  > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons
>  > >  > > [3] http://www.w3.org/TR/xquery/#mapping
>  > >  > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion
>  > >  > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
>  > >  > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
>  > >  >
>  > >
>  > >
>  > >
>  >
>  >
> 
> 

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Thursday, 21 April 2005 00:15:36 UTC