- From: Howard Katz <howardk@fatdog.com>
- Date: Mon, 18 Apr 2005 19:57:40 -0700
- To: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
I have less time available for this than I'd hoped, so I'm going to present an extremely pithy look at what I consider the bare, bare essentials of XQuery-based comparison semantics and not do a full, feature-by-feature comparison against what we're doing in sparql, except in a few instances. I'm also going to call this Part I and only look at numerics. Part II (if I can find the time to do it, and there's interest) will look at strings, dates, times, and the other remaining XML Schema built-in datatypes. I'd be grateful for (gentle!) feedback if anyone finds I've made any egregious errors in the following. 95% of readers find the following information correct 95% of the time. :-) Howard Value vs. General Comparisons ------------------------------------ XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] and general (=, !=, >, <, >=, <=) comparisons [2]. Value comparisons are used for comparing singletons; general comparisons have their raison d'etre in the requirements of XPath and are for comparing sequences. Since sparql doesn't have sequences, and general comparisons generally devolve into item-by-item comparisons using value semantics anyway (with a few exceptions I won't go into at the moment), I'm only going to look at value comparisons here. (I might look at general comparisons in Part II if there seems to be an interest; I haven't decided yet whether there is or isn't something useful to be learned from that topic.) I'll note that while sparql appears to be using value comparisons (ie, singletons only), it uses the operator symbol set from general comparisons. I think it's arguable whether this is good, bad, or indifferent; if we wanted to be precise, we should probably be using the value-comparisons symbol set , but since those symbols seem to be fortran-based and thus likely to be viewed as a wondrous cosmological mystery by anybody under the age of 50 or so, :-) I see no problem with using the more familiar (=, !=, >, <, >=, <=) symbols. In the following where I'm talking about value comparisons specifically, I'll use the proper value-comparison operators from XQuery so as to (hopefully!) not further confuse the issue. Atomization -------------- The first step in doing a comparison is atomization, in which each operand is reduced to a sequence of atomic values and types. In value comparisons, the atomized operands must be either singleton atomic values or the empty (null) sequence. Atomizing the literal "2" results in a single value "2" of type string. Atomizing an element <e>2</e> without an accompanying schema results in a value "2" of type xdt:untypedAtomic. If something ends up as xdt:untypedAtomic, it is treated as a string (value comparisons do things slightly differently). After atomization: o If either operand is null, a null is returned. o If the cardinality of either operand is > 1, a type error is thrown. o Otherwise an xs:boolean result is returned, showing the results of the comparison. Once the operands have been atomized, the proper comparison function from the Binary Operator table in the Working Draft [3] needs to be identified for the two operands. Comparison functions operate on "similar" types; if the types of the operands are too dissimilar, a type error is thrown. What do I mean by "similar" and "dissimilar" (my own terminology; not part of the formal specification)? Similarity means that both operands must be of the same type to begin with or can be converted to be of the same type through either type promotion [4] or subtype substitution [5] (see below). First, here's a counter-example: strings and any form of numeric are dissimilar. The query: 1 lt "2" => Saxon: "ERROR XPTY0004: Cannot compare xs:integer to xs:string" throws a type error because strings can't be compared to numerics. Numeric comparisons ------------------------- On the other hand, 1 lt 2.0 => false compares a numeric (xs:integer) against a numeric (xs:decimal) using the numeric comparison function, op:numeric-less-than( a, b ). Numeric comparisons allow the greatest degree of operand dissimilarity, since there are actually sixteen numeric subtypes in the XML Schema built-in datatypes hierarchy [6] that can be passed in as arguments to numeric functions such as op:numeric-less-than() above. There are actually four sub-varieties of numeric functions per each op: function: one version to handle floats, one to handle doubles, one to handle decimals, and one to handle integers. If other datatypes are to be compared, they need to be converted to one of these four types first. The algorithm for doing the conversion can be presented as a multiway "if" statement. Assuming that both operands are numeric: if either of the two operands is of type float convert the other to float and call the appropriate compare-floats() function else if either of the operands is of type double convert the other to double and call the appropriate compare-doubles() function else if either of the operands is of type decimal convert the other to decimal and call the appropriate compare-decimals() function else convert both to integer (if necessary) and call the appropriate compare-integers() function In the case of 1 lt 2.0 for example (xs:integer vs xs:decimal), the compare-decimals() version of op:numeric-less-than() ends up getting called. Type Promotion -------------------- The word "converts" in the algorithm refers to both the mechanisms of type promotion [4] and subtype substitution [5], depending on what the source and target numeric types are. If a double is being converted to a float, for example, type promotion is used. Decimals (or any type derived from decimal) can also be promoted to either double or float. (I find the term "promotion" here a bit misleading when talking about doubles and floats, since to me promotion seems to imply movement or casting "up" a type hierarchy. Floats and doubles however are at the same level in the XML Schema built-in type hierarchy [6], and neither is superior or subordinate to the other in terms of derivation.) There's a second variety of type promotion in XQuery where any value of type xs:anyURI can be promoted to string, so that any operator that compares strings can take an xs:anyURI type of argument. Subtype Substitution ------------------------- Subtype substitution results when a subtype is used where its supertype is required. In the last branch of the above "if" statement, any numeric type that's subordinate to xs:decimal can be used as an argument to the appropriate compare-decimals() function, and anything subordinate to xs:integer can be used as an argument to compare-integers(). For example, in the comparison xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 ) both operands are passed into the compare-integers() version of op:op:numeric-less-than() and compared as integers without casting. xs:integer is the lowest common type that superordinate to both nonPositiveInteger and nonNegativeInteger. Depending on the types of the operands, several conversions might be done internally to bring both operands to a common (or similar) type. For example, xs:double( 3.14159e0 ) lt xs:short( 4 ) => true first up-converts the xs:short to xs:decimal using subtype subsitution, and then type-promotes the xs:decimal to xs:double, so that the compare-doubles() version of op:numeric-less-than() can be used. The implementation needn't go through all the intermediate stages if it can be done more directly (I believe). That's it for the moment ... ---------------------------------------------------------------------------- --- [1] http://www.w3.org/TR/xquery/#id-value-comparisons [2] http://www.w3.org/TR/xquery/#id-general-comparisons [3] http://www.w3.org/TR/xquery/#mapping [4] http://www.w3.org/TR/xquery/#dt-type-promotion [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
Received on Tuesday, 19 April 2005 02:57:46 UTC