- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 21 Apr 2005 03:05:37 -0400
- To: Howard Katz <howardk@fatdog.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
- Message-ID: <20050421070536.GB18713@w3.org>
On Wed, Apr 20, 2005 at 08:05:25PM -0700, Howard Katz wrote:
>
>
> > -----Original Message-----
> > From: Eric Prud'hommeaux [mailto:eric@w3.org]
> > Sent: Wednesday, April 20, 2005 5:01 PM
> > To: Howard Katz
> > Cc: RDF Data Access Working Group
> > Subject: Re: XQuery value comparisons (Part I - numerics)
> >
> >
> > On Mon, Apr 18, 2005 at 07:57:40PM -0700, Howard Katz wrote:
> > >
> > > I have less time available for this than I'd hoped, so I'm
> > going to present
> > > an extremely pithy look at what I consider the bare, bare essentials of
> > > XQuery-based comparison semantics and not do a full, feature-by-feature
> > > comparison against what we're doing in sparql, except in a few
> > instances.
> > > I'm also going to call this Part I and only look at numerics.
> > Part II (if I
> > > can find the time to do it, and there's interest) will look at strings,
> > > dates, times, and the other remaining XML Schema built-in datatypes.
> > >
> > > I'd be grateful for (gentle!) feedback if anyone finds I've made any
> > > egregious errors in the following. 95% of readers find the following
> > > information correct 95% of the time. :-)
> > >
> > > Howard
> > >
> > > Value vs. General Comparisons
> > > ------------------------------------
> > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1]
> > and general
> > > (=, !=, >, <, >=, <=) comparisons [2]. Value comparisons are used for
> > > comparing singletons; general comparisons have their raison
> > d'etre in the
> > > requirements of XPath and are for comparing sequences.
> >
> > Is that their only use? I've unwittingly focused on value comparisons.
> > I'd like to hear that general comparisons apply to us. If they are
> > only needed for sequence comparison, then I am happy to carry on with
> > my myopic examination of value comparison.
>
> Their other use, as already described elsewhere, is to provide a greater
> possibility of a successful comparison. They're less rigorous in what they
> reject. I believe this originated in XPath 1, tho I'm not sure of the exact
> rationale. It's kind of like designing browsers not to fail on crufty (sp?)
> html.
OK. This makes me much more sure that we want to steer clear of
general comparison.
> >
> > > Since
> > sparql doesn't
> > > have sequences,
> >
> > http://unagi/2001/sw/DataAccess/rq23/#StandardOperations
> > [[
> > Unlike XPath/XQuery, SPARQL functions do not process node
> > sequences. When interpreting the semantics of XPath functions, assume
> > that each argument is a sequence of a single node.
> > ]]
> >
> > > and general comparisons generally devolve into
> > item-by-item
> > > comparisons using value semantics anyway (with a few
> > exceptions I won't go
> > > into at the moment), I'm only going to look at value
> > comparisons here. (I
> > > might look at general comparisons in Part II if there seems to be an
> > > interest; I haven't decided yet whether there is or isn't
> > something useful
> > > to be learned from that topic.)
> > >
> > > I'll note that while sparql appears to be using value comparisons (ie,
> > > singletons only), it uses the operator symbol set from general
> > comparisons.
> > > I think it's arguable whether this is good, bad, or indifferent; if we
> > > wanted to be precise, we should probably be using the value-comparisons
> > > symbol set , but since those symbols seem to be fortran-based and thus
> > > likely to be viewed as a wondrous cosmological mystery by
> > anybody under the
> > > age of 50 or so, :-) I see no problem with using the more
> > familiar (=, !=,
> >
> > ha!
> >
> > > >, <, >=, <=) symbols. In the following where I'm talking about value
> > > comparisons specifically, I'll use the proper value-comparison
> > operators
> > > from XQuery so as to (hopefully!) not further confuse the issue.
> > >
> > > Atomization
> > > --------------
> > > The first step in doing a comparison is atomization, in which
> > each operand
> > > is reduced to a sequence of atomic values and types. In value
> > comparisons,
> > > the atomized operands must be either singleton atomic values
> > or the empty
> > > (null) sequence. Atomizing the literal "2" results in a single
> > value "2" of
> > > type string. Atomizing an element <e>2</e> without an
> > accompanying schema
> > > results in a value "2" of type xdt:untypedAtomic. If something
> > ends up as
> > > xdt:untypedAtomic, it is treated as a string (value
> > comparisons do things
> > > slightly differently).
> > >
> > > After atomization:
> > >
> > > o If either operand is null, a null is returned.
> > > o If the cardinality of either operand is > 1, a type error is thrown.
> > > o Otherwise an xs:boolean result is returned, showing the
> > results of the
> > > comparison.
> > >
> > > Once the operands have been atomized, the proper comparison
> > function from
> > > the Binary Operator table in the Working Draft [3] needs to be
> > identified
> > > for the two operands. Comparison functions operate on
> > "similar" types; if
> > > the types of the operands are too dissimilar, a type error is
> > thrown. What
> > > do I mean by "similar" and "dissimilar" (my own terminology;
> > not part of the
> > > formal specification)? Similarity means that both operands
> > must be of the
> > > same type to begin with or can be converted to be of the same
> > type through
> > > either type promotion [4] or subtype substitution [5] (see below).
> > >
> > > First, here's a counter-example: strings and any form of numeric are
> > > dissimilar. The query:
> > >
> > > 1 lt "2" => Saxon: "ERROR XPTY0004: Cannot compare
> > xs:integer to
> > > xs:string"
> > >
> > > throws a type error because strings can't be compared to numerics.
> > >
> > > Numeric comparisons
> > > -------------------------
> > > On the other hand,
> > >
> > > 1 lt 2.0 => false
> >
> > false? really? NTP didn't move the args into a domain where 1 < 2 ?
> >
> > > compares a numeric (xs:integer) against a numeric (xs:decimal)
> > using the
> > > numeric comparison function, op:numeric-less-than( a, b ).
> > >
> > > Numeric comparisons allow the greatest degree of operand dissimilarity,
> > > since there are actually sixteen numeric subtypes in the XML
> > Schema built-in
> > > datatypes hierarchy [6] that can be passed in as arguments to numeric
> > > functions such as op:numeric-less-than() above.
> > >
> > > There are actually four sub-varieties of numeric functions per each op:
> > > function: one version to handle floats, one to handle doubles,
> > one to handle
> > > decimals, and one to handle integers. If other datatypes are
> > to be compared,
> > > they need to be converted to one of these four types first.
> > The algorithm
> > > for doing the conversion can be presented as a multiway "if" statement.
> > > Assuming that both operands are numeric:
> > >
> > > if either of the two operands is of type float
> > > convert the other to float and call the appropriate
> > compare-floats()
> > > function
> > > else if either of the operands is of type double
> > > convert the other to double and call the appropriate
> > > compare-doubles() function
> > > else if either of the operands is of type decimal
> > > convert the other to decimal and call the appropriate
> > > compare-decimals() function
> > > else
> > > convert both to integer (if necessary) and call the appropriate
> > > compare-integers() function
> > >
> > > In the case of
> > >
> > > 1 lt 2.0
> > >
> > > for example (xs:integer vs xs:decimal), the compare-decimals()
> > version of
> > > op:numeric-less-than() ends up getting called.
> > >
> > > Type Promotion
> > > --------------------
> > > The word "converts" in the algorithm refers to both the
> > mechanisms of type
> > > promotion [4] and subtype substitution [5], depending on what
> > the source and
> > > target numeric types are. If a double is being converted to a
> > float, for
> > > example, type promotion is used. Decimals (or any type derived
> > from decimal)
> > > can also be promoted to either double or float. (I find the
> > term "promotion"
> > > here a bit misleading when talking about doubles and floats,
> > since to me
> > > promotion seems to imply movement or casting "up" a type
> > hierarchy. Floats
> > > and doubles however are at the same level in the XML Schema
> > built-in type
> > > hierarchy [6], and neither is superior or subordinate to the
> > other in terms
> > > of derivation.)
> >
> > I assumed the justification had to do with binary representations and
> > that decimals (whatever they are) could be respresented as (fit in)
> > floats and floats could fit in doubles. This rationale suggests that
> > the least constrained subtype of decimal fits in a float, which I
> > don't know to be true.
>
> I believe what you're saying is correct. I don't believe I'm saying that's
> not the case. (If you believe me. :-)
>
> >
> > > There's a second variety of type promotion in XQuery where any
> > value of type
> > > xs:anyURI can be promoted to string, so that any operator that compares
> > > strings can take an xs:anyURI type of argument.
> >
> > I had something analogous:
> > [[
> > For functions and operators where the expected type is specified as
> > numeric, untyped literals are cast to xs:double.
> > ]]
> > but commented it out. In fact, I think it my implementation
> > automatically casts them to string when needed. Thus, the above line
> > should go back in, but s/xs:double/xs:string/ .
> >
> > > Subtype Substitution
> > > -------------------------
> > > Subtype substitution results when a subtype is used where its
> > supertype is
> > > required. In the last branch of the above "if" statement, any
> > numeric type
> > > that's subordinate to xs:decimal can be used as an argument to the
> > > appropriate compare-decimals() function, and anything subordinate to
> > > xs:integer can be used as an argument to compare-integers().
> > For example, in
> > > the comparison
> > >
> > > xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )
> > >
> > > both operands are passed into the compare-integers() version of
> > > op:op:numeric-less-than() and compared as integers without casting.
> > > xs:integer is the lowest common type that superordinate to both
> > > nonPositiveInteger and nonNegativeInteger.
> >
> > A difference I recall about NTP vs. subtype promotion is that NTP
> > actually changes the type of the argument rather than harmlessly
> > upcasting it to pass it as it to a function. The related spec text*:
>
> NTP? I don't know if it's relevant, but much is made in the XQuery text
> about subtype substitution *not* actually changing the type of the converted
> operand. I've never been sure exactly why that fact was being emphasized.
Per http://unagi/TR/2005/WD-xquery-20050404/#N165E1 I think the
distinction occurs when you declare a function that takes a float
declare namespace test = "http://foo.example/"
define function test:imafloat($num as xs:float) as xs:boolean {
return $num instance of xs:decimal
}
will see the argument as a float in the function regardless of its
lineage.
test:imafloat(xs:decimal(5)) => false
Conversely, subtype substitutions retain their origonal type:
define function test:imapostitiveInteger($num as xs:integer) as xs:postitiveInteger {
return $num instance of xs:postitiveInteger
}
test:imapostitiveInteger(xs:postitiveInteger("true")) => true
Having framed this, does XQuery-instance-of pay attention to the type
tree? I.E., does
xs:postitiveInteger("true") instance of xs:integer
return true or false?
> > [[
> > XML Schema [] defines a set of types derived from decimal: integer;
> > nonPositiveInteger; negativeInteger; long; int; short; byte;
> > nonNegativeInteger; unsignedLong; unsignedInt; unsignedShort;
> > unsignedByte and positiveInteger. These are all treated as decimals
> > for arithmetic operations. SPARQL does not specifically require
> > integrity checks on derived subtypes. SPARQL has no numeric type test
> > operators so the distinction between a primitive type and a type
> > derived from that primitive type is unobservable.
> > ]]
> > * just fixed 30s ago.
> >
> > > Depending on the types of the operands, several conversions
> > might be done
> > > internally to bring both operands to a common (or similar) type. For
> > > example,
> > >
> > > xs:double( 3.14159e0 ) lt xs:short( 4 ) => true
> > >
> > > first up-converts the xs:short to xs:decimal using subtype
> > subsitution, and
> > > then type-promotes the xs:decimal to xs:double, so that the
> > > compare-doubles() version of op:numeric-less-than() can be used. The
> > > implementation needn't go through all the intermediate stages
> > if it can be
> > > done more directly (I believe).
> > >
> > > That's it for the moment ...
> > >
> > >
> > -----------------------------------------------------------------
> > -----------
> > > ---
> > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons
> > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons
> > > [3] http://www.w3.org/TR/xquery/#mapping
> > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion
> > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
> > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
> > >
> > >
> >
> >
> > office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
> > Shonan Fujisawa Campus, Keio University,
> > 5322 Endo, Fujisawa, Kanagawa 252-8520
> > JAPAN
> > +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
> > cell: +81.90.6533.3882
> >
> > (eric@w3.org)
> > Feel free to forward this message to any list for any purpose other than
> > email address distribution.
> >
>
>
--
-eric
office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
Shonan Fujisawa Campus, Keio University,
5322 Endo, Fujisawa, Kanagawa 252-8520
JAPAN
+1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell: +81.90.6533.3882
(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Thursday, 21 April 2005 07:05:37 UTC