Re: XQuery value comparisons (Part I - numerics)

On Wed, Apr 20, 2005 at 08:05:25PM -0700, Howard Katz wrote:
> 
> 
>  > -----Original Message-----
>  > From: Eric Prud'hommeaux [mailto:eric@w3.org]
>  > Sent: Wednesday, April 20, 2005 5:01 PM
>  > To: Howard Katz
>  > Cc: RDF Data Access Working Group
>  > Subject: Re: XQuery value comparisons (Part I - numerics)
>  >
>  >
>  > On Mon, Apr 18, 2005 at 07:57:40PM -0700, Howard Katz wrote:
>  > >
>  > > I have less time available for this than I'd hoped, so I'm
>  > going to present
>  > > an extremely pithy look at what I consider the bare, bare essentials of
>  > > XQuery-based comparison semantics and not do a full, feature-by-feature
>  > > comparison against what we're doing in sparql, except in a few
>  > instances.
>  > > I'm also going to call this Part I and only look at numerics.
>  > Part II (if I
>  > > can find the time to do it, and there's interest) will look at strings,
>  > > dates, times, and the other remaining XML Schema built-in datatypes.
>  > >
>  > > I'd be grateful for (gentle!) feedback if anyone finds I've made any
>  > > egregious errors in the following. 95% of readers find the following
>  > > information correct 95% of the time. :-)
>  > >
>  > > Howard
>  > >
>  > > Value vs. General Comparisons
>  > > ------------------------------------
>  > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1]
>  > and general
>  > > (=, !=, >, <, >=, <=) comparisons [2]. Value comparisons are used for
>  > > comparing singletons; general comparisons have their raison
>  > d'etre in the
>  > > requirements of XPath and are for comparing sequences.
>  >
>  > Is that their only use? I've unwittingly focused on value comparisons.
>  > I'd like to hear that general comparisons apply to us. If they are
>  > only needed for sequence comparison, then I am happy to carry on with
>  > my myopic examination of value comparison.
> 
> Their other use, as already described elsewhere, is to provide a greater
> possibility of a successful comparison. They're less rigorous in what they
> reject. I believe this originated in XPath 1, tho I'm not sure of the exact
> rationale. It's kind of like designing browsers not to fail on crufty (sp?)
> html.

OK. This makes me much more sure that we want to steer clear of
general comparison.

>  >
>  > >                                                        Since
>  > sparql doesn't
>  > > have sequences,
>  >
>  > http://unagi/2001/sw/DataAccess/rq23/#StandardOperations
>  > [[
>  > Unlike XPath/XQuery, SPARQL functions do not process node
>  > sequences. When interpreting the semantics of XPath functions, assume
>  > that each argument is a sequence of a single node.
>  > ]]
>  >
>  > >                 and general comparisons generally devolve into
>  > item-by-item
>  > > comparisons using value semantics anyway (with a few
>  > exceptions I won't go
>  > > into at the moment), I'm only going to look at value
>  > comparisons here. (I
>  > > might look at general comparisons in Part II if there seems to be an
>  > > interest; I haven't decided yet whether there is or isn't
>  > something useful
>  > > to be learned from that topic.)
>  > >
>  > > I'll note that while sparql appears to be using value comparisons (ie,
>  > > singletons only), it uses the operator symbol set from general
>  > comparisons.
>  > > I think it's arguable whether this is good, bad, or indifferent; if we
>  > > wanted to be precise, we should probably be using the value-comparisons
>  > > symbol set , but since those symbols seem to be fortran-based and thus
>  > > likely to be viewed as a wondrous cosmological mystery by
>  > anybody under the
>  > > age of 50 or so, :-) I see no problem with using the more
>  > familiar (=, !=,
>  >
>  > ha!
>  >
>  > > >, <, >=, <=) symbols. In the following where I'm talking about value
>  > > comparisons specifically, I'll use the proper value-comparison
>  > operators
>  > > from XQuery so as to (hopefully!) not further confuse the issue.
>  > >
>  > > Atomization
>  > > --------------
>  > > The first step in doing a comparison is atomization, in which
>  > each operand
>  > > is reduced to a sequence of atomic values and types. In value
>  > comparisons,
>  > > the atomized operands must be either singleton atomic values
>  > or the empty
>  > > (null) sequence. Atomizing the literal "2" results in a single
>  > value "2" of
>  > > type string. Atomizing an element <e>2</e> without an
>  > accompanying schema
>  > > results in a value "2" of type xdt:untypedAtomic. If something
>  > ends up as
>  > > xdt:untypedAtomic, it is treated as a string (value
>  > comparisons do things
>  > > slightly differently).
>  > >
>  > > After atomization:
>  > >
>  > > o If either operand is null, a null is returned.
>  > > o If the cardinality of either operand is > 1, a type error is thrown.
>  > > o Otherwise an xs:boolean result is returned, showing the
>  > results of the
>  > > comparison.
>  > >
>  > > Once the operands have been atomized, the proper comparison
>  > function from
>  > > the Binary Operator table in the Working Draft [3] needs to be
>  > identified
>  > > for the two operands. Comparison functions operate on
>  > "similar" types; if
>  > > the types of the operands are too dissimilar, a type error is
>  > thrown. What
>  > > do I mean by "similar" and "dissimilar" (my own terminology;
>  > not part of the
>  > > formal specification)? Similarity means that both operands
>  > must be of the
>  > > same type to begin with or can be converted to be of the same
>  > type through
>  > > either type promotion [4] or subtype substitution [5] (see below).
>  > >
>  > > First, here's a counter-example: strings and any form of numeric are
>  > > dissimilar. The query:
>  > >
>  > >      1 lt "2"    => Saxon: "ERROR XPTY0004: Cannot compare
>  > xs:integer to
>  > > xs:string"
>  > >
>  > > throws a type error because strings can't be compared to numerics.
>  > >
>  > > Numeric comparisons
>  > > -------------------------
>  > > On the other hand,
>  > >
>  > >     1 lt 2.0    => false
>  >
>  > false? really? NTP didn't move the args into a domain where 1 < 2 ?
>  >
>  > > compares a numeric (xs:integer) against a numeric (xs:decimal)
>  > using the
>  > > numeric comparison function, op:numeric-less-than( a, b ).
>  > >
>  > > Numeric comparisons allow the greatest degree of operand dissimilarity,
>  > > since there are actually sixteen numeric subtypes in the XML
>  > Schema built-in
>  > > datatypes hierarchy [6] that can be passed in as arguments to numeric
>  > > functions such as op:numeric-less-than() above.
>  > >
>  > > There are actually four sub-varieties of numeric functions per each op:
>  > > function: one version to handle floats, one to handle doubles,
>  > one to handle
>  > > decimals, and one to handle integers. If other datatypes are
>  > to be compared,
>  > > they need to be converted to one of these four types first.
>  > The algorithm
>  > > for doing the conversion can be presented as a multiway "if" statement.
>  > > Assuming that both operands are numeric:
>  > >
>  > > if either of the two operands is of type float
>  > >        convert the other to float and call the appropriate
>  > compare-floats()
>  > > function
>  > > else if either of the operands is of type double
>  > >        convert the other to double and call the appropriate
>  > > compare-doubles() function
>  > > else if either of the operands is of type decimal
>  > >        convert the other to decimal and call the appropriate
>  > > compare-decimals() function
>  > > else
>  > >        convert both to integer (if necessary) and call the appropriate
>  > > compare-integers() function
>  > >
>  > > In the case of
>  > >
>  > >      1 lt 2.0
>  > >
>  > > for example (xs:integer vs xs:decimal), the compare-decimals()
>  > version of
>  > > op:numeric-less-than() ends up getting called.
>  > >
>  > > Type Promotion
>  > > --------------------
>  > > The word "converts" in the algorithm refers to both the
>  > mechanisms of type
>  > > promotion [4] and subtype substitution [5], depending on what
>  > the source and
>  > > target numeric types are. If a double is being converted to a
>  > float, for
>  > > example, type promotion is used. Decimals (or any type derived
>  > from decimal)
>  > > can also be promoted to either double or float. (I find the
>  > term "promotion"
>  > > here a bit misleading when talking about doubles and floats,
>  > since to me
>  > > promotion seems to imply movement or casting "up" a type
>  > hierarchy. Floats
>  > > and doubles however are at the same level in the XML Schema
>  > built-in type
>  > > hierarchy [6], and neither is superior or subordinate to the
>  > other in terms
>  > > of derivation.)
>  >
>  > I assumed the justification had to do with binary representations and
>  > that decimals (whatever they are) could be respresented as (fit in)
>  > floats and floats could fit in doubles. This rationale suggests that
>  > the least constrained subtype of decimal fits in a float, which I
>  > don't know to be true.
> 
> I believe what you're saying is correct. I don't believe I'm saying that's
> not the case. (If you believe me. :-)
> 
>  >
>  > > There's a second variety of type promotion in XQuery where any
>  > value of type
>  > > xs:anyURI can be promoted to string, so that any operator that compares
>  > > strings can take an xs:anyURI type of argument.
>  >
>  > I had something analogous:
>  > [[
>  > For functions and operators where the expected type is specified as
>  > numeric, untyped literals are cast to xs:double.
>  > ]]
>  > but commented it out. In fact, I think it my implementation
>  > automatically casts them to string when needed. Thus, the above line
>  > should go back in, but s/xs:double/xs:string/ .
>  >
>  > > Subtype Substitution
>  > > -------------------------
>  > > Subtype substitution results when a subtype is used where its
>  > supertype is
>  > > required. In the last branch of the above "if" statement, any
>  > numeric type
>  > > that's subordinate to xs:decimal can be used as an argument to the
>  > > appropriate compare-decimals() function, and anything subordinate to
>  > > xs:integer can be used as an argument to compare-integers().
>  > For example, in
>  > > the comparison
>  > >
>  > >      xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )
>  > >
>  > > both operands are passed into the compare-integers() version of
>  > > op:op:numeric-less-than() and compared as integers without casting.
>  > > xs:integer is the lowest common type that superordinate to both
>  > > nonPositiveInteger and nonNegativeInteger.
>  >
>  > A difference I recall about NTP vs. subtype promotion is that NTP
>  > actually changes the type of the argument rather than harmlessly
>  > upcasting it to pass it as it to a function. The related spec text*:
> 
> NTP? I don't know if it's relevant, but much is made in the XQuery text
> about subtype substitution *not* actually changing the type of the converted
> operand. I've never been sure exactly why that fact was being emphasized.

Per http://unagi/TR/2005/WD-xquery-20050404/#N165E1 I think the
distinction occurs when you declare a function that takes a float

  declare namespace test = "http://foo.example/"
  define function test:imafloat($num as xs:float) as xs:boolean {
      return $num instance of xs:decimal
  }

will see the argument as a float in the function regardless of its
lineage.

  test:imafloat(xs:decimal(5)) => false


Conversely, subtype substitutions retain their origonal type:

  define function test:imapostitiveInteger($num as xs:integer) as xs:postitiveInteger {
      return $num instance of xs:postitiveInteger
  }

  test:imapostitiveInteger(xs:postitiveInteger("true")) => true


Having framed this, does XQuery-instance-of pay attention to the type
tree? I.E., does
  xs:postitiveInteger("true") instance of xs:integer
return true or false?


>  > [[
>  > XML Schema [] defines a set of types derived from decimal: integer;
>  > nonPositiveInteger; negativeInteger; long; int; short; byte;
>  > nonNegativeInteger; unsignedLong; unsignedInt; unsignedShort;
>  > unsignedByte and positiveInteger. These are all treated as decimals
>  > for arithmetic operations. SPARQL does not specifically require
>  > integrity checks on derived subtypes. SPARQL has no numeric type test
>  > operators so the distinction between a primitive type and a type
>  > derived from that primitive type is unobservable.
>  > ]]
>  > * just fixed 30s ago.
>  >
>  > > Depending on the types of the operands, several conversions
>  > might be done
>  > > internally to bring both operands to a common (or similar) type. For
>  > > example,
>  > >
>  > >     xs:double( 3.14159e0 ) lt xs:short( 4 ) => true
>  > >
>  > > first up-converts the xs:short to xs:decimal using subtype
>  > subsitution, and
>  > > then type-promotes the xs:decimal to xs:double, so that the
>  > > compare-doubles() version of op:numeric-less-than() can be used. The
>  > > implementation needn't go through all the intermediate stages
>  > if it can be
>  > > done more directly (I believe).
>  > >
>  > > That's it for the moment ...
>  > >
>  > >
>  > -----------------------------------------------------------------
>  > -----------
>  > > ---
>  > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons
>  > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons
>  > > [3] http://www.w3.org/TR/xquery/#mapping
>  > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion
>  > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
>  > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
>  > >
>  > >
>  >
>  >
>  > office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
>  >                         Shonan Fujisawa Campus, Keio University,
>  >                         5322 Endo, Fujisawa, Kanagawa 252-8520
>  >                         JAPAN
>  >         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
>  > cell:   +81.90.6533.3882
>  >
>  > (eric@w3.org)
>  > Feel free to forward this message to any list for any purpose other than
>  > email address distribution.
>  >
> 
> 

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Thursday, 21 April 2005 07:05:37 UTC