- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 20 Apr 2005 20:15:34 -0400
- To: Howard Katz <howardk@fatdog.com>
- Cc: andy.seaborne@hp.com, RDF Data Access Working Group <public-rdf-dawg@w3.org>
- Message-ID: <20050421001534.GB20950@w3.org>
On Tue, Apr 19, 2005 at 10:24:55AM -0700, Howard Katz wrote: > > > Howard Katz wrote: > > > > > > > Howard, > > > > > > > > Thanks for this - it is really helpful, especially about the type > > > > promotion and subtype substitution. I think I can upgrade > > to that style > > > > of execution so I think it is doable. I'd be interested in > > ptrs to how > > > > expression evaluation maps to existing technologies like SQL. > > > > > > Sorry, I don't have any data points in that department. > > > > > > > > > > > Do you think this means we need to say what happens to > > literals of type > > > > XMLLiteral? > > > > > > Yikes, I'd forgotten all about xml literals! Do you really > > want to go there? > > > > It's only because all the major issues are clear from your email > > that I thought > > of such corner cases. Personally, I have seen little use of XML > > literals so I > > don't have a strong sense of what to do or how important they are. > > Nice to think I've done such a useful job of it! (I just had a horrible > recollection tho of the fate of the one-eyed man in the land of the blind. > Maybe I shouldn't go there ...? ;-) > > Fwiw, one issue I *didn't* get around to explicating that might have some > relevance to the current discussion is the difference in the treatment of > untyped atomics by value and general comparisons. The difference I did > mention is that value comparisons take singletons, while general comparisons > handle sequences. What I didn't discuss is that general comparisons, geared > towards xpaths, are much more lenient in how they treat untyped values. To > wit, in the value comparison, > > <e>123</e> eq "123" > > the element content 123 is typed as xdt:untypedAtomic by the atomization > process in the absence of a schema. Items of type untypedAtomic in value > comparisons are converted to xs:string, and the string compare > function is As you observed earlier, we seem to be using value comparison with general comparison operator syntax. This automatic promotion to string is consistent with that. > ultimately called. If we tweak the query slightly however and say, > > <e>123</e> eq 123 > > we throw a type error, since you can't compare a string (resulting from > atomization on the lhs) to an integer (the rhs is an integer literal). > > If the above queries are restated as general comparisons however, eg > > <e>123</e> = "123" > > the rules change. In this case, the rule is that the untypedAtomic 123 is > converted to the type of _the other operand_. The net effect is the same: we > ultimately invoke a string compare function, with the same result. However, > in the case of the second query, restated as a general comparison: > > <e>123</e> = 123 > > the untypedAtomic 123 now gets converted to an integer (the type of the > other operand), and we do an integer compare, instead of failing as happened > in the case of the corresponding value comparison. This is what scared me. How does it know what the other type is? For instance, does <e>123</e> = <e>123.0</e> work? Neither has a type. Lexically they differ. How about <e>123</e> = 123.0 > I'm not sure if there's relevancy in the above, but maybe it'll tweak some > useful thoughts ... > > Howard > > > > > Andy > > > > > :-) The one thing that springs to mind is that comparisons against xml > > > literals would be like XQuery comparisons against untyped > > nodes, where the > > > contents are extracted via atomization. I'd like to go do a > > quick review on > > > atomizing node content if you're really interested ... > > > > > > Howard > > > > > > > > > > > I prefer the forms =, != etc rather than eg, ne, gt,.. because we > > > > just need one class of comparisions. And we don't have issues of > > > > serializing "<". > > > > > > > > In the 95%*95% > > > > Andy > > > > > > > > -------- Original Message -------- > > > > > From: Howard Katz <> > > > > > Date: 19 April 2005 03:58 > > > > > > > > > > I have less time available for this than I'd hoped, so > > I'm going to > > > > > present > > > > > an extremely pithy look at what I consider the bare, bare > > essentials > > > > of > > > > > XQuery-based comparison semantics and not do a full, > > > > feature-by-feature > > > > > comparison against what we're doing in sparql, except in a few > > > > > instances. > > > > > I'm also going to call this Part I and only look at > > numerics. Part II > > > > > (if I > > > > > can find the time to do it, and there's interest) will look at > > > > strings, > > > > > dates, times, and the other remaining XML Schema built-in > > datatypes. > > > > > > > > > > I'd be grateful for (gentle!) feedback if anyone finds > > I've made any > > > > > egregious errors in the following. 95% of readers find > > the following > > > > > information correct 95% of the time. :-) > > > > > > > > > > Howard > > > > > > > > > > Value vs. General Comparisons > > > > > ------------------------------------ > > > > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] and > > > > > general (=, !=, >, <, >=, <=) comparisons [2]. Value > > comparisons are > > > > > used for > > > > > comparing singletons; general comparisons have their > > raison d'etre in > > > > > the > > > > > requirements of XPath and are for comparing sequences. > > Since sparql > > > > > doesn't > > > > > have sequences, and general comparisons generally devolve into > > > > > item-by-item > > > > > comparisons using value semantics anyway (with a few exceptions I > > > > won't > > > > > go > > > > > into at the moment), I'm only going to look at value > > comparisons here. > > > > > (I > > > > > might look at general comparisons in Part II if there > > seems to be an > > > > > interest; I haven't decided yet whether there is or isn't > > something > > > > > useful > > > > > to be learned from that topic.) > > > > > > > > > > I'll note that while sparql appears to be using value > > comparisons (ie, > > > > > singletons only), it uses the operator symbol set from general > > > > > comparisons. > > > > > I think it's arguable whether this is good, bad, or > > indifferent; if we > > > > > wanted to be precise, we should probably be using the > > > > value-comparisons > > > > > symbol set , but since those symbols seem to be > > fortran-based and thus > > > > > likely to be viewed as a wondrous cosmological mystery by anybody > > > > under > > > > > the > > > > > age of 50 or so, :-) I see no problem with using the more > > familiar (=, > > > > > !=, > > > > > > , <, >=, <=) symbols. In the following where I'm > > talking about value > > > > > comparisons specifically, I'll use the proper value-comparison > > > > operators > > > > > from XQuery so as to (hopefully!) not further confuse the issue. > > > > > > > > > > Atomization > > > > > -------------- > > > > > The first step in doing a comparison is atomization, in which each > > > > > operand > > > > > is reduced to a sequence of atomic values and types. In value > > > > > comparisons, > > > > > the atomized operands must be either singleton atomic > > values or the > > > > > empty (null) sequence. Atomizing the literal "2" results > > in a single > > > > > value "2" of > > > > > type string. Atomizing an element <e>2</e> without an accompanying > > > > > schema > > > > > results in a value "2" of type xdt:untypedAtomic. If > > something ends up > > > > > as xdt:untypedAtomic, it is treated as a string (value > > comparisons do > > > > > things > > > > > slightly differently). > > > > > > > > > > After atomization: > > > > > > > > > > o If either operand is null, a null is returned. > > > > > o If the cardinality of either operand is > 1, a type > > error is thrown. > > > > > o Otherwise an xs:boolean result is returned, showing the > > results of > > > > the > > > > > comparison. > > > > > > > > > > Once the operands have been atomized, the proper > > comparison function > > > > > from > > > > > the Binary Operator table in the Working Draft [3] needs to be > > > > > identified > > > > > for the two operands. Comparison functions operate on > > "similar" types; > > > > > if > > > > > the types of the operands are too dissimilar, a type > > error is thrown. > > > > > What > > > > > do I mean by "similar" and "dissimilar" (my own > > terminology; not part > > > > > of the > > > > > formal specification)? Similarity means that both > > operands must be of > > > > > the > > > > > same type to begin with or can be converted to be of the same type > > > > > through > > > > > either type promotion [4] or subtype substitution [5] (see below). > > > > > > > > > > First, here's a counter-example: strings and any form of > > numeric are > > > > > dissimilar. The query: > > > > > > > > > > 1 lt "2" => Saxon: "ERROR XPTY0004: Cannot > > compare xs:integer > > > > to > > > > > xs:string" > > > > > > > > > > throws a type error because strings can't be compared to numerics. > > > > > > > > > > Numeric comparisons > > > > > ------------------------- > > > > > On the other hand, > > > > > > > > > > 1 lt 2.0 => false > > > > > > > > > > compares a numeric (xs:integer) against a numeric > > (xs:decimal) using > > > > the > > > > > numeric comparison function, op:numeric-less-than( a, b ). > > > > > > > > > > Numeric comparisons allow the greatest degree of operand > > > > dissimilarity, > > > > > since there are actually sixteen numeric subtypes in the > > XML Schema > > > > > built-in > > > > > datatypes hierarchy [6] that can be passed in as > > arguments to numeric > > > > > functions such as op:numeric-less-than() above. > > > > > > > > > > There are actually four sub-varieties of numeric > > functions per each > > > > op: > > > > > function: one version to handle floats, one to handle > > doubles, one to > > > > > handle > > > > > decimals, and one to handle integers. If other datatypes are to be > > > > > compared, > > > > > they need to be converted to one of these four types first. The > > > > > algorithm > > > > > for doing the conversion can be presented as a multiway "if" > > > > statement. > > > > > Assuming that both operands are numeric: > > > > > > > > > > if either of the two operands is of type float > > > > > convert the other to float and call the appropriate > > > > > compare-floats() > > > > > function > > > > > else if either of the operands is of type double > > > > > convert the other to double and call the appropriate > > > > > compare-doubles() function > > > > > else if either of the operands is of type decimal > > > > > convert the other to decimal and call the appropriate > > > > > compare-decimals() function > > > > > else > > > > > convert both to integer (if necessary) and call > > the appropriate > > > > > compare-integers() function > > > > > > > > > > In the case of > > > > > > > > > > 1 lt 2.0 > > > > > > > > > > for example (xs:integer vs xs:decimal), the > > compare-decimals() version > > > > > of op:numeric-less-than() ends up getting called. > > > > > > > > > > Type Promotion > > > > > -------------------- > > > > > The word "converts" in the algorithm refers to both the > > mechanisms of > > > > > type > > > > > promotion [4] and subtype substitution [5], depending on what the > > > > > source and > > > > > target numeric types are. If a double is being converted > > to a float, > > > > for > > > > > example, type promotion is used. Decimals (or any type > > derived from > > > > > decimal) > > > > > can also be promoted to either double or float. (I find the term > > > > > "promotion" > > > > > here a bit misleading when talking about doubles and > > floats, since to > > > > me > > > > > promotion seems to imply movement or casting "up" a type > > hierarchy. > > > > > Floats > > > > > and doubles however are at the same level in the XML > > Schema built-in > > > > > type > > > > > hierarchy [6], and neither is superior or subordinate to > > the other in > > > > > terms > > > > > of derivation.) > > > > > > > > > > There's a second variety of type promotion in XQuery > > where any value > > > > of > > > > > type > > > > > xs:anyURI can be promoted to string, so that any operator that > > > > compares > > > > > strings can take an xs:anyURI type of argument. > > > > > > > > > > Subtype Substitution > > > > > ------------------------- > > > > > Subtype substitution results when a subtype is used where its > > > > supertype > > > > > is > > > > > required. In the last branch of the above "if" statement, > > any numeric > > > > > type > > > > > that's subordinate to xs:decimal can be used as an argument to the > > > > > appropriate compare-decimals() function, and anything > > subordinate to > > > > > xs:integer can be used as an argument to compare-integers(). For > > > > > example, in > > > > > the comparison > > > > > > > > > > xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 ) > > > > > > > > > > both operands are passed into the compare-integers() version of > > > > > op:op:numeric-less-than() and compared as integers > > without casting. > > > > > xs:integer is the lowest common type that superordinate to both > > > > > nonPositiveInteger and nonNegativeInteger. > > > > > > > > > > Depending on the types of the operands, several > > conversions might be > > > > > done > > > > > internally to bring both operands to a common (or > > similar) type. For > > > > > example, > > > > > > > > > > xs:double( 3.14159e0 ) lt xs:short( 4 ) => true > > > > > > > > > > first up-converts the xs:short to xs:decimal using subtype > > > > subsitution, > > > > > and > > > > > then type-promotes the xs:decimal to xs:double, so that the > > > > > compare-doubles() version of op:numeric-less-than() can > > be used. The > > > > > implementation needn't go through all the intermediate > > stages if it > > > > can > > > > > be > > > > > done more directly (I believe). > > > > > > > > > > That's it for the moment ... > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > ---- > > > > > --- > > > > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons > > > > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons > > > > > [3] http://www.w3.org/TR/xquery/#mapping > > > > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion > > > > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution > > > > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes > > > > > > > > > > > > > > > > > > > -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +81.90.6533.3882 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Thursday, 21 April 2005 00:15:36 UTC