- From: Howard Katz <howardk@fatdog.com>
- Date: Wed, 20 Apr 2005 20:05:25 -0700
- To: "Eric Prud'hommeaux" <eric@w3.org>
- Cc: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
> -----Original Message----- > From: Eric Prud'hommeaux [mailto:eric@w3.org] > Sent: Wednesday, April 20, 2005 5:01 PM > To: Howard Katz > Cc: RDF Data Access Working Group > Subject: Re: XQuery value comparisons (Part I - numerics) > > > On Mon, Apr 18, 2005 at 07:57:40PM -0700, Howard Katz wrote: > > > > I have less time available for this than I'd hoped, so I'm > going to present > > an extremely pithy look at what I consider the bare, bare essentials of > > XQuery-based comparison semantics and not do a full, feature-by-feature > > comparison against what we're doing in sparql, except in a few > instances. > > I'm also going to call this Part I and only look at numerics. > Part II (if I > > can find the time to do it, and there's interest) will look at strings, > > dates, times, and the other remaining XML Schema built-in datatypes. > > > > I'd be grateful for (gentle!) feedback if anyone finds I've made any > > egregious errors in the following. 95% of readers find the following > > information correct 95% of the time. :-) > > > > Howard > > > > Value vs. General Comparisons > > ------------------------------------ > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] > and general > > (=, !=, >, <, >=, <=) comparisons [2]. Value comparisons are used for > > comparing singletons; general comparisons have their raison > d'etre in the > > requirements of XPath and are for comparing sequences. > > Is that their only use? I've unwittingly focused on value comparisons. > I'd like to hear that general comparisons apply to us. If they are > only needed for sequence comparison, then I am happy to carry on with > my myopic examination of value comparison. Their other use, as already described elsewhere, is to provide a greater possibility of a successful comparison. They're less rigorous in what they reject. I believe this originated in XPath 1, tho I'm not sure of the exact rationale. It's kind of like designing browsers not to fail on crufty (sp?) html. > > > Since > sparql doesn't > > have sequences, > > http://unagi/2001/sw/DataAccess/rq23/#StandardOperations > [[ > Unlike XPath/XQuery, SPARQL functions do not process node > sequences. When interpreting the semantics of XPath functions, assume > that each argument is a sequence of a single node. > ]] > > > and general comparisons generally devolve into > item-by-item > > comparisons using value semantics anyway (with a few > exceptions I won't go > > into at the moment), I'm only going to look at value > comparisons here. (I > > might look at general comparisons in Part II if there seems to be an > > interest; I haven't decided yet whether there is or isn't > something useful > > to be learned from that topic.) > > > > I'll note that while sparql appears to be using value comparisons (ie, > > singletons only), it uses the operator symbol set from general > comparisons. > > I think it's arguable whether this is good, bad, or indifferent; if we > > wanted to be precise, we should probably be using the value-comparisons > > symbol set , but since those symbols seem to be fortran-based and thus > > likely to be viewed as a wondrous cosmological mystery by > anybody under the > > age of 50 or so, :-) I see no problem with using the more > familiar (=, !=, > > ha! > > > >, <, >=, <=) symbols. In the following where I'm talking about value > > comparisons specifically, I'll use the proper value-comparison > operators > > from XQuery so as to (hopefully!) not further confuse the issue. > > > > Atomization > > -------------- > > The first step in doing a comparison is atomization, in which > each operand > > is reduced to a sequence of atomic values and types. In value > comparisons, > > the atomized operands must be either singleton atomic values > or the empty > > (null) sequence. Atomizing the literal "2" results in a single > value "2" of > > type string. Atomizing an element <e>2</e> without an > accompanying schema > > results in a value "2" of type xdt:untypedAtomic. If something > ends up as > > xdt:untypedAtomic, it is treated as a string (value > comparisons do things > > slightly differently). > > > > After atomization: > > > > o If either operand is null, a null is returned. > > o If the cardinality of either operand is > 1, a type error is thrown. > > o Otherwise an xs:boolean result is returned, showing the > results of the > > comparison. > > > > Once the operands have been atomized, the proper comparison > function from > > the Binary Operator table in the Working Draft [3] needs to be > identified > > for the two operands. Comparison functions operate on > "similar" types; if > > the types of the operands are too dissimilar, a type error is > thrown. What > > do I mean by "similar" and "dissimilar" (my own terminology; > not part of the > > formal specification)? Similarity means that both operands > must be of the > > same type to begin with or can be converted to be of the same > type through > > either type promotion [4] or subtype substitution [5] (see below). > > > > First, here's a counter-example: strings and any form of numeric are > > dissimilar. The query: > > > > 1 lt "2" => Saxon: "ERROR XPTY0004: Cannot compare > xs:integer to > > xs:string" > > > > throws a type error because strings can't be compared to numerics. > > > > Numeric comparisons > > ------------------------- > > On the other hand, > > > > 1 lt 2.0 => false > > false? really? NTP didn't move the args into a domain where 1 < 2 ? > > > compares a numeric (xs:integer) against a numeric (xs:decimal) > using the > > numeric comparison function, op:numeric-less-than( a, b ). > > > > Numeric comparisons allow the greatest degree of operand dissimilarity, > > since there are actually sixteen numeric subtypes in the XML > Schema built-in > > datatypes hierarchy [6] that can be passed in as arguments to numeric > > functions such as op:numeric-less-than() above. > > > > There are actually four sub-varieties of numeric functions per each op: > > function: one version to handle floats, one to handle doubles, > one to handle > > decimals, and one to handle integers. If other datatypes are > to be compared, > > they need to be converted to one of these four types first. > The algorithm > > for doing the conversion can be presented as a multiway "if" statement. > > Assuming that both operands are numeric: > > > > if either of the two operands is of type float > > convert the other to float and call the appropriate > compare-floats() > > function > > else if either of the operands is of type double > > convert the other to double and call the appropriate > > compare-doubles() function > > else if either of the operands is of type decimal > > convert the other to decimal and call the appropriate > > compare-decimals() function > > else > > convert both to integer (if necessary) and call the appropriate > > compare-integers() function > > > > In the case of > > > > 1 lt 2.0 > > > > for example (xs:integer vs xs:decimal), the compare-decimals() > version of > > op:numeric-less-than() ends up getting called. > > > > Type Promotion > > -------------------- > > The word "converts" in the algorithm refers to both the > mechanisms of type > > promotion [4] and subtype substitution [5], depending on what > the source and > > target numeric types are. If a double is being converted to a > float, for > > example, type promotion is used. Decimals (or any type derived > from decimal) > > can also be promoted to either double or float. (I find the > term "promotion" > > here a bit misleading when talking about doubles and floats, > since to me > > promotion seems to imply movement or casting "up" a type > hierarchy. Floats > > and doubles however are at the same level in the XML Schema > built-in type > > hierarchy [6], and neither is superior or subordinate to the > other in terms > > of derivation.) > > I assumed the justification had to do with binary representations and > that decimals (whatever they are) could be respresented as (fit in) > floats and floats could fit in doubles. This rationale suggests that > the least constrained subtype of decimal fits in a float, which I > don't know to be true. I believe what you're saying is correct. I don't believe I'm saying that's not the case. (If you believe me. :-) > > > There's a second variety of type promotion in XQuery where any > value of type > > xs:anyURI can be promoted to string, so that any operator that compares > > strings can take an xs:anyURI type of argument. > > I had something analogous: > [[ > For functions and operators where the expected type is specified as > numeric, untyped literals are cast to xs:double. > ]] > but commented it out. In fact, I think it my implementation > automatically casts them to string when needed. Thus, the above line > should go back in, but s/xs:double/xs:string/ . > > > Subtype Substitution > > ------------------------- > > Subtype substitution results when a subtype is used where its > supertype is > > required. In the last branch of the above "if" statement, any > numeric type > > that's subordinate to xs:decimal can be used as an argument to the > > appropriate compare-decimals() function, and anything subordinate to > > xs:integer can be used as an argument to compare-integers(). > For example, in > > the comparison > > > > xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 ) > > > > both operands are passed into the compare-integers() version of > > op:op:numeric-less-than() and compared as integers without casting. > > xs:integer is the lowest common type that superordinate to both > > nonPositiveInteger and nonNegativeInteger. > > A difference I recall about NTP vs. subtype promotion is that NTP > actually changes the type of the argument rather than harmlessly > upcasting it to pass it as it to a function. The related spec text*: NTP? I don't know if it's relevant, but much is made in the XQuery text about subtype substitution *not* actually changing the type of the converted operand. I've never been sure exactly why that fact was being emphasized. > [[ > XML Schema [] defines a set of types derived from decimal: integer; > nonPositiveInteger; negativeInteger; long; int; short; byte; > nonNegativeInteger; unsignedLong; unsignedInt; unsignedShort; > unsignedByte and positiveInteger. These are all treated as decimals > for arithmetic operations. SPARQL does not specifically require > integrity checks on derived subtypes. SPARQL has no numeric type test > operators so the distinction between a primitive type and a type > derived from that primitive type is unobservable. > ]] > * just fixed 30s ago. > > > Depending on the types of the operands, several conversions > might be done > > internally to bring both operands to a common (or similar) type. For > > example, > > > > xs:double( 3.14159e0 ) lt xs:short( 4 ) => true > > > > first up-converts the xs:short to xs:decimal using subtype > subsitution, and > > then type-promotes the xs:decimal to xs:double, so that the > > compare-doubles() version of op:numeric-less-than() can be used. The > > implementation needn't go through all the intermediate stages > if it can be > > done more directly (I believe). > > > > That's it for the moment ... > > > > > ----------------------------------------------------------------- > ----------- > > --- > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons > > [3] http://www.w3.org/TR/xquery/#mapping > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes > > > > > > -- > -eric > > office: +81.466.49.1170 W3C, Keio Research Institute at SFC, > Shonan Fujisawa Campus, Keio University, > 5322 Endo, Fujisawa, Kanagawa 252-8520 > JAPAN > +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA > cell: +81.90.6533.3882 > > (eric@w3.org) > Feel free to forward this message to any list for any purpose other than > email address distribution. >
Received on Thursday, 21 April 2005 03:05:32 UTC