Re: XQuery value comparisons (Part I - numerics)

On Tue, Apr 19, 2005 at 08:44:02AM -0700, Howard Katz wrote:
> 
> 
>  > Howard,
>  >
>  > Thanks for this - it is really helpful, especially about the type
>  > promotion and subtype substitution.  I think I can upgrade to that style
>  > of execution so I think it is doable.  I'd be interested in ptrs to how
>  > expression evaluation maps to existing technologies like SQL.
> 
> Sorry, I don't have any data points in that department.
> 
>  >
>  > Do you think this means we need to say what happens to literals of type
>  > XMLLiteral?
> 
> Yikes, I'd forgotten all about xml literals! Do you really want to go there?

I'd say no. We have a small set of simple types that we support, to
the extent that we know when they are equivilent or smaller or
larger. Apart from that, we support node equivilence via the RDF
semantics. I bet that's good enough for a large number of use cases.

I'm reasonably confident that we're not painting ourselves into a corner.

> :-) The one thing that springs to mind is that comparisons against xml
> literals would be like XQuery comparisons against untyped nodes, where the
> contents are extracted via atomization. I'd like to go do a quick review on
> atomizing node content if you're really interested ...
> 
> Howard
> 
>  >
>  > I prefer the forms =, != etc   rather than eg, ne, gt,..  because we
>  > just need one class of comparisions.  And we don't have issues of
>  > serializing "<".
>  >
>  > 	In the 95%*95%
>  > 	Andy
>  >
>  > -------- Original Message --------
>  > > From: Howard Katz <>
>  > > Date: 19 April 2005 03:58
>  > >
>  > > I have less time available for this than I'd hoped, so I'm going to
>  > > present
>  > > an extremely pithy look at what I consider the bare, bare essentials
>  > of
>  > > XQuery-based comparison semantics and not do a full,
>  > feature-by-feature
>  > > comparison against what we're doing in sparql, except in a few
>  > > instances.
>  > > I'm also going to call this Part I and only look at numerics. Part II
>  > > (if I
>  > > can find the time to do it, and there's interest) will look at
>  > strings,
>  > > dates, times, and the other remaining XML Schema built-in datatypes.
>  > >
>  > > I'd be grateful for (gentle!) feedback if anyone finds I've made any
>  > > egregious errors in the following. 95% of readers find the following
>  > > information correct 95% of the time. :-)
>  > >
>  > > Howard
>  > >
>  > > Value vs. General Comparisons
>  > > ------------------------------------
>  > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] and
>  > > general (=, !=, >, <, >=, <=) comparisons [2]. Value comparisons are
>  > > used for
>  > > comparing singletons; general comparisons have their raison d'etre in
>  > > the
>  > > requirements of XPath and are for comparing sequences. Since sparql
>  > > doesn't
>  > > have sequences, and general comparisons generally devolve into
>  > > item-by-item
>  > > comparisons using value semantics anyway (with a few exceptions I
>  > won't
>  > > go
>  > > into at the moment), I'm only going to look at value comparisons here.
>  > > (I
>  > > might look at general comparisons in Part II if there seems to be an
>  > > interest; I haven't decided yet whether there is or isn't something
>  > > useful
>  > > to be learned from that topic.)
>  > >
>  > > I'll note that while sparql appears to be using value comparisons (ie,
>  > > singletons only), it uses the operator symbol set from general
>  > > comparisons.
>  > > I think it's arguable whether this is good, bad, or indifferent; if we
>  > > wanted to be precise, we should probably be using the
>  > value-comparisons
>  > > symbol set , but since those symbols seem to be fortran-based and thus
>  > > likely to be viewed as a wondrous cosmological mystery by anybody
>  > under
>  > > the
>  > > age of 50 or so, :-) I see no problem with using the more familiar (=,
>  > > !=,
>  > > > , <, >=, <=) symbols. In the following where I'm talking about value
>  > > comparisons specifically, I'll use the proper value-comparison
>  > operators
>  > > from XQuery so as to (hopefully!) not further confuse the issue.
>  > >
>  > > Atomization
>  > > --------------
>  > > The first step in doing a comparison is atomization, in which each
>  > > operand
>  > > is reduced to a sequence of atomic values and types. In value
>  > > comparisons,
>  > > the atomized operands must be either singleton atomic values or the
>  > > empty (null) sequence. Atomizing the literal "2" results in a single
>  > > value "2" of
>  > > type string. Atomizing an element <e>2</e> without an accompanying
>  > > schema
>  > > results in a value "2" of type xdt:untypedAtomic. If something ends up
>  > > as xdt:untypedAtomic, it is treated as a string (value comparisons do
>  > > things
>  > > slightly differently).
>  > >
>  > > After atomization:
>  > >
>  > > o If either operand is null, a null is returned.
>  > > o If the cardinality of either operand is > 1, a type error is thrown.
>  > > o Otherwise an xs:boolean result is returned, showing the results of
>  > the
>  > > comparison.
>  > >
>  > > Once the operands have been atomized, the proper comparison function
>  > > from
>  > > the Binary Operator table in the Working Draft [3] needs to be
>  > > identified
>  > > for the two operands. Comparison functions operate on "similar" types;
>  > > if
>  > > the types of the operands are too dissimilar, a type error is thrown.
>  > > What
>  > > do I mean by "similar" and "dissimilar" (my own terminology; not part
>  > > of the
>  > > formal specification)? Similarity means that both operands must be of
>  > > the
>  > > same type to begin with or can be converted to be of the same type
>  > > through
>  > > either type promotion [4] or subtype substitution [5] (see below).
>  > >
>  > > First, here's a counter-example: strings and any form of numeric are
>  > > dissimilar. The query:
>  > >
>  > >      1 lt "2"    => Saxon: "ERROR XPTY0004: Cannot compare xs:integer
>  > to
>  > > xs:string"
>  > >
>  > > throws a type error because strings can't be compared to numerics.
>  > >
>  > > Numeric comparisons
>  > > -------------------------
>  > > On the other hand,
>  > >
>  > >     1 lt 2.0    => false
>  > >
>  > > compares a numeric (xs:integer) against a numeric (xs:decimal) using
>  > the
>  > > numeric comparison function, op:numeric-less-than( a, b ).
>  > >
>  > > Numeric comparisons allow the greatest degree of operand
>  > dissimilarity,
>  > > since there are actually sixteen numeric subtypes in the XML Schema
>  > > built-in
>  > > datatypes hierarchy [6] that can be passed in as arguments to numeric
>  > > functions such as op:numeric-less-than() above.
>  > >
>  > > There are actually four sub-varieties of numeric functions per each
>  > op:
>  > > function: one version to handle floats, one to handle doubles, one to
>  > > handle
>  > > decimals, and one to handle integers. If other datatypes are to be
>  > > compared,
>  > > they need to be converted to one of these four types first. The
>  > > algorithm
>  > > for doing the conversion can be presented as a multiway "if"
>  > statement.
>  > > Assuming that both operands are numeric:
>  > >
>  > > if either of the two operands is of type float
>  > >        convert the other to float and call the appropriate
>  > > compare-floats()
>  > > function
>  > > else if either of the operands is of type double
>  > >        convert the other to double and call the appropriate
>  > > compare-doubles() function
>  > > else if either of the operands is of type decimal
>  > >        convert the other to decimal and call the appropriate
>  > > compare-decimals() function
>  > > else
>  > >        convert both to integer (if necessary) and call the appropriate
>  > > compare-integers() function
>  > >
>  > > In the case of
>  > >
>  > >      1 lt 2.0
>  > >
>  > > for example (xs:integer vs xs:decimal), the compare-decimals() version
>  > > of op:numeric-less-than() ends up getting called.
>  > >
>  > > Type Promotion
>  > > --------------------
>  > > The word "converts" in the algorithm refers to both the mechanisms of
>  > > type
>  > > promotion [4] and subtype substitution [5], depending on what the
>  > > source and
>  > > target numeric types are. If a double is being converted to a float,
>  > for
>  > > example, type promotion is used. Decimals (or any type derived from
>  > > decimal)
>  > > can also be promoted to either double or float. (I find the term
>  > > "promotion"
>  > > here a bit misleading when talking about doubles and floats, since to
>  > me
>  > > promotion seems to imply movement or casting "up" a type hierarchy.
>  > > Floats
>  > > and doubles however are at the same level in the XML Schema built-in
>  > > type
>  > > hierarchy [6], and neither is superior or subordinate to the other in
>  > > terms
>  > > of derivation.)
>  > >
>  > > There's a second variety of type promotion in XQuery where any value
>  > of
>  > > type
>  > > xs:anyURI can be promoted to string, so that any operator that
>  > compares
>  > > strings can take an xs:anyURI type of argument.
>  > >
>  > > Subtype Substitution
>  > > -------------------------
>  > > Subtype substitution results when a subtype is used where its
>  > supertype
>  > > is
>  > > required. In the last branch of the above "if" statement, any numeric
>  > > type
>  > > that's subordinate to xs:decimal can be used as an argument to the
>  > > appropriate compare-decimals() function, and anything subordinate to
>  > > xs:integer can be used as an argument to compare-integers(). For
>  > > example, in
>  > > the comparison
>  > >
>  > >      xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )
>  > >
>  > > both operands are passed into the compare-integers() version of
>  > > op:op:numeric-less-than() and compared as integers without casting.
>  > > xs:integer is the lowest common type that superordinate to both
>  > > nonPositiveInteger and nonNegativeInteger.
>  > >
>  > > Depending on the types of the operands, several conversions might be
>  > > done
>  > > internally to bring both operands to a common (or similar) type. For
>  > > example,
>  > >
>  > >     xs:double( 3.14159e0 ) lt xs:short( 4 ) => true
>  > >
>  > > first up-converts the xs:short to xs:decimal using subtype
>  > subsitution,
>  > > and
>  > > then type-promotes the xs:decimal to xs:double, so that the
>  > > compare-doubles() version of op:numeric-less-than() can be used. The
>  > > implementation needn't go through all the intermediate stages if it
>  > can
>  > > be
>  > > done more directly (I believe).
>  > >
>  > > That's it for the moment ...
>  > >
>  > >
>  > ------------------------------------------------------------------------
>  > ----
>  > > ---
>  > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons
>  > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons
>  > > [3] http://www.w3.org/TR/xquery/#mapping
>  > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion
>  > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
>  > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
>  >
> 
> 

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Wednesday, 20 April 2005 22:50:29 UTC