Re: XQuery value comparisons (Part I - numerics)

On Tue, Apr 19, 2005 at 05:26:26PM +0100, Seaborne, Andy wrote:
> 
> 
> 
> Howard Katz wrote:
> >
> > > Howard,
> > >
> > > Thanks for this - it is really helpful, especially about the type
> > > promotion and subtype substitution.  I think I can upgrade to that style
> > > of execution so I think it is doable.  I'd be interested in ptrs to how
> > > expression evaluation maps to existing technologies like SQL.
> >
> >Sorry, I don't have any data points in that department.
> >
> > >
> > > Do you think this means we need to say what happens to literals of type
> > > XMLLiteral?
> >
> >Yikes, I'd forgotten all about xml literals! Do you really want to go 
> >there?
> 
> It's only because all the major issues are clear from your email that I 
> thought of such corner cases.  Personally, I have seen little use of XML 
> literals so I don't have a strong sense of what to do or how important they 
> are.

Annotea uses them for shipping XHTML documents around. This is just a
convenience of packaging, though. All that happens it it gets stored
in the KB with a hint that it is better written as an XMLLiteral than
as xml-escaped CDATA.

> 	Andy
> 
> >:-) The one thing that springs to mind is that comparisons against xml
> >literals would be like XQuery comparisons against untyped nodes, where the
> >contents are extracted via atomization. I'd like to go do a quick review on
> >atomizing node content if you're really interested ...
> >
> >Howard
> >
> > >
> > > I prefer the forms =, != etc   rather than eg, ne, gt,..  because we
> > > just need one class of comparisions.  And we don't have issues of
> > > serializing "<".
> > >
> > > 	In the 95%*95%
> > > 	Andy
> > >
> > > -------- Original Message --------
> > > > From: Howard Katz <>
> > > > Date: 19 April 2005 03:58
> > > >
> > > > I have less time available for this than I'd hoped, so I'm going to
> > > > present
> > > > an extremely pithy look at what I consider the bare, bare essentials
> > > of
> > > > XQuery-based comparison semantics and not do a full,
> > > feature-by-feature
> > > > comparison against what we're doing in sparql, except in a few
> > > > instances.
> > > > I'm also going to call this Part I and only look at numerics. Part II
> > > > (if I
> > > > can find the time to do it, and there's interest) will look at
> > > strings,
> > > > dates, times, and the other remaining XML Schema built-in datatypes.
> > > >
> > > > I'd be grateful for (gentle!) feedback if anyone finds I've made any
> > > > egregious errors in the following. 95% of readers find the following
> > > > information correct 95% of the time. :-)
> > > >
> > > > Howard
> > > >
> > > > Value vs. General Comparisons
> > > > ------------------------------------
> > > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] and
> > > > general (=, !=, >, <, >=, <=) comparisons [2]. Value comparisons are
> > > > used for
> > > > comparing singletons; general comparisons have their raison d'etre in
> > > > the
> > > > requirements of XPath and are for comparing sequences. Since sparql
> > > > doesn't
> > > > have sequences, and general comparisons generally devolve into
> > > > item-by-item
> > > > comparisons using value semantics anyway (with a few exceptions I
> > > won't
> > > > go
> > > > into at the moment), I'm only going to look at value comparisons here.
> > > > (I
> > > > might look at general comparisons in Part II if there seems to be an
> > > > interest; I haven't decided yet whether there is or isn't something
> > > > useful
> > > > to be learned from that topic.)
> > > >
> > > > I'll note that while sparql appears to be using value comparisons (ie,
> > > > singletons only), it uses the operator symbol set from general
> > > > comparisons.
> > > > I think it's arguable whether this is good, bad, or indifferent; if we
> > > > wanted to be precise, we should probably be using the
> > > value-comparisons
> > > > symbol set , but since those symbols seem to be fortran-based and thus
> > > > likely to be viewed as a wondrous cosmological mystery by anybody
> > > under
> > > > the
> > > > age of 50 or so, :-) I see no problem with using the more familiar (=,
> > > > !=,
> > > > > , <, >=, <=) symbols. In the following where I'm talking about value
> > > > comparisons specifically, I'll use the proper value-comparison
> > > operators
> > > > from XQuery so as to (hopefully!) not further confuse the issue.
> > > >
> > > > Atomization
> > > > --------------
> > > > The first step in doing a comparison is atomization, in which each
> > > > operand
> > > > is reduced to a sequence of atomic values and types. In value
> > > > comparisons,
> > > > the atomized operands must be either singleton atomic values or the
> > > > empty (null) sequence. Atomizing the literal "2" results in a single
> > > > value "2" of
> > > > type string. Atomizing an element <e>2</e> without an accompanying
> > > > schema
> > > > results in a value "2" of type xdt:untypedAtomic. If something ends up
> > > > as xdt:untypedAtomic, it is treated as a string (value comparisons do
> > > > things
> > > > slightly differently).
> > > >
> > > > After atomization:
> > > >
> > > > o If either operand is null, a null is returned.
> > > > o If the cardinality of either operand is > 1, a type error is thrown.
> > > > o Otherwise an xs:boolean result is returned, showing the results of
> > > the
> > > > comparison.
> > > >
> > > > Once the operands have been atomized, the proper comparison function
> > > > from
> > > > the Binary Operator table in the Working Draft [3] needs to be
> > > > identified
> > > > for the two operands. Comparison functions operate on "similar" types;
> > > > if
> > > > the types of the operands are too dissimilar, a type error is thrown.
> > > > What
> > > > do I mean by "similar" and "dissimilar" (my own terminology; not part
> > > > of the
> > > > formal specification)? Similarity means that both operands must be of
> > > > the
> > > > same type to begin with or can be converted to be of the same type
> > > > through
> > > > either type promotion [4] or subtype substitution [5] (see below).
> > > >
> > > > First, here's a counter-example: strings and any form of numeric are
> > > > dissimilar. The query:
> > > >
> > > >      1 lt "2"    => Saxon: "ERROR XPTY0004: Cannot compare xs:integer
> > > to
> > > > xs:string"
> > > >
> > > > throws a type error because strings can't be compared to numerics.
> > > >
> > > > Numeric comparisons
> > > > -------------------------
> > > > On the other hand,
> > > >
> > > >     1 lt 2.0    => false
> > > >
> > > > compares a numeric (xs:integer) against a numeric (xs:decimal) using
> > > the
> > > > numeric comparison function, op:numeric-less-than( a, b ).
> > > >
> > > > Numeric comparisons allow the greatest degree of operand
> > > dissimilarity,
> > > > since there are actually sixteen numeric subtypes in the XML Schema
> > > > built-in
> > > > datatypes hierarchy [6] that can be passed in as arguments to numeric
> > > > functions such as op:numeric-less-than() above.
> > > >
> > > > There are actually four sub-varieties of numeric functions per each
> > > op:
> > > > function: one version to handle floats, one to handle doubles, one to
> > > > handle
> > > > decimals, and one to handle integers. If other datatypes are to be
> > > > compared,
> > > > they need to be converted to one of these four types first. The
> > > > algorithm
> > > > for doing the conversion can be presented as a multiway "if"
> > > statement.
> > > > Assuming that both operands are numeric:
> > > >
> > > > if either of the two operands is of type float
> > > >        convert the other to float and call the appropriate
> > > > compare-floats()
> > > > function
> > > > else if either of the operands is of type double
> > > >        convert the other to double and call the appropriate
> > > > compare-doubles() function
> > > > else if either of the operands is of type decimal
> > > >        convert the other to decimal and call the appropriate
> > > > compare-decimals() function
> > > > else
> > > >        convert both to integer (if necessary) and call the appropriate
> > > > compare-integers() function
> > > >
> > > > In the case of
> > > >
> > > >      1 lt 2.0
> > > >
> > > > for example (xs:integer vs xs:decimal), the compare-decimals() version
> > > > of op:numeric-less-than() ends up getting called.
> > > >
> > > > Type Promotion
> > > > --------------------
> > > > The word "converts" in the algorithm refers to both the mechanisms of
> > > > type
> > > > promotion [4] and subtype substitution [5], depending on what the
> > > > source and
> > > > target numeric types are. If a double is being converted to a float,
> > > for
> > > > example, type promotion is used. Decimals (or any type derived from
> > > > decimal)
> > > > can also be promoted to either double or float. (I find the term
> > > > "promotion"
> > > > here a bit misleading when talking about doubles and floats, since to
> > > me
> > > > promotion seems to imply movement or casting "up" a type hierarchy.
> > > > Floats
> > > > and doubles however are at the same level in the XML Schema built-in
> > > > type
> > > > hierarchy [6], and neither is superior or subordinate to the other in
> > > > terms
> > > > of derivation.)
> > > >
> > > > There's a second variety of type promotion in XQuery where any value
> > > of
> > > > type
> > > > xs:anyURI can be promoted to string, so that any operator that
> > > compares
> > > > strings can take an xs:anyURI type of argument.
> > > >
> > > > Subtype Substitution
> > > > -------------------------
> > > > Subtype substitution results when a subtype is used where its
> > > supertype
> > > > is
> > > > required. In the last branch of the above "if" statement, any numeric
> > > > type
> > > > that's subordinate to xs:decimal can be used as an argument to the
> > > > appropriate compare-decimals() function, and anything subordinate to
> > > > xs:integer can be used as an argument to compare-integers(). For
> > > > example, in
> > > > the comparison
> > > >
> > > >      xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )
> > > >
> > > > both operands are passed into the compare-integers() version of
> > > > op:op:numeric-less-than() and compared as integers without casting.
> > > > xs:integer is the lowest common type that superordinate to both
> > > > nonPositiveInteger and nonNegativeInteger.
> > > >
> > > > Depending on the types of the operands, several conversions might be
> > > > done
> > > > internally to bring both operands to a common (or similar) type. For
> > > > example,
> > > >
> > > >     xs:double( 3.14159e0 ) lt xs:short( 4 ) => true
> > > >
> > > > first up-converts the xs:short to xs:decimal using subtype
> > > subsitution,
> > > > and
> > > > then type-promotes the xs:decimal to xs:double, so that the
> > > > compare-doubles() version of op:numeric-less-than() can be used. The
> > > > implementation needn't go through all the intermediate stages if it
> > > can
> > > > be
> > > > done more directly (I believe).
> > > >
> > > > That's it for the moment ...
> > > >
> > > >
> > > ------------------------------------------------------------------------
> > > ----
> > > > ---
> > > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons
> > > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons
> > > > [3] http://www.w3.org/TR/xquery/#mapping
> > > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion
> > > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
> > > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
> > >
> >
> >
> >
> 

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Wednesday, 20 April 2005 22:44:12 UTC