- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 20 Apr 2005 20:15:34 -0400
- To: Howard Katz <howardk@fatdog.com>
- Cc: andy.seaborne@hp.com, RDF Data Access Working Group <public-rdf-dawg@w3.org>
- Message-ID: <20050421001534.GB20950@w3.org>
On Tue, Apr 19, 2005 at 10:24:55AM -0700, Howard Katz wrote:
>
> > Howard Katz wrote:
> > >
> > > > Howard,
> > > >
> > > > Thanks for this - it is really helpful, especially about the type
> > > > promotion and subtype substitution. I think I can upgrade
> > to that style
> > > > of execution so I think it is doable. I'd be interested in
> > ptrs to how
> > > > expression evaluation maps to existing technologies like SQL.
> > >
> > > Sorry, I don't have any data points in that department.
> > >
> > > >
> > > > Do you think this means we need to say what happens to
> > literals of type
> > > > XMLLiteral?
> > >
> > > Yikes, I'd forgotten all about xml literals! Do you really
> > want to go there?
> >
> > It's only because all the major issues are clear from your email
> > that I thought
> > of such corner cases. Personally, I have seen little use of XML
> > literals so I
> > don't have a strong sense of what to do or how important they are.
>
> Nice to think I've done such a useful job of it! (I just had a horrible
> recollection tho of the fate of the one-eyed man in the land of the blind.
> Maybe I shouldn't go there ...? ;-)
>
> Fwiw, one issue I *didn't* get around to explicating that might have some
> relevance to the current discussion is the difference in the treatment of
> untyped atomics by value and general comparisons. The difference I did
> mention is that value comparisons take singletons, while general comparisons
> handle sequences. What I didn't discuss is that general comparisons, geared
> towards xpaths, are much more lenient in how they treat untyped values. To
> wit, in the value comparison,
>
> <e>123</e> eq "123"
>
> the element content 123 is typed as xdt:untypedAtomic by the atomization
> process in the absence of a schema. Items of type untypedAtomic in value
> comparisons are converted to xs:string, and the string compare
> function is
As you observed earlier, we seem to be using value comparison with
general comparison operator syntax. This automatic promotion to string
is consistent with that.
> ultimately called. If we tweak the query slightly however and say,
>
> <e>123</e> eq 123
>
> we throw a type error, since you can't compare a string (resulting from
> atomization on the lhs) to an integer (the rhs is an integer literal).
>
> If the above queries are restated as general comparisons however, eg
>
> <e>123</e> = "123"
>
> the rules change. In this case, the rule is that the untypedAtomic 123 is
> converted to the type of _the other operand_. The net effect is the same: we
> ultimately invoke a string compare function, with the same result. However,
> in the case of the second query, restated as a general comparison:
>
> <e>123</e> = 123
>
> the untypedAtomic 123 now gets converted to an integer (the type of the
> other operand), and we do an integer compare, instead of failing as happened
> in the case of the corresponding value comparison.
This is what scared me. How does it know what the other type is? For
instance, does
<e>123</e> = <e>123.0</e>
work? Neither has a type. Lexically they differ. How about
<e>123</e> = 123.0
> I'm not sure if there's relevancy in the above, but maybe it'll tweak some
> useful thoughts ...
>
> Howard
>
> >
> > Andy
> >
> > > :-) The one thing that springs to mind is that comparisons against xml
> > > literals would be like XQuery comparisons against untyped
> > nodes, where the
> > > contents are extracted via atomization. I'd like to go do a
> > quick review on
> > > atomizing node content if you're really interested ...
> > >
> > > Howard
> > >
> > > >
> > > > I prefer the forms =, != etc rather than eg, ne, gt,.. because we
> > > > just need one class of comparisions. And we don't have issues of
> > > > serializing "<".
> > > >
> > > > In the 95%*95%
> > > > Andy
> > > >
> > > > -------- Original Message --------
> > > > > From: Howard Katz <>
> > > > > Date: 19 April 2005 03:58
> > > > >
> > > > > I have less time available for this than I'd hoped, so
> > I'm going to
> > > > > present
> > > > > an extremely pithy look at what I consider the bare, bare
> > essentials
> > > > of
> > > > > XQuery-based comparison semantics and not do a full,
> > > > feature-by-feature
> > > > > comparison against what we're doing in sparql, except in a few
> > > > > instances.
> > > > > I'm also going to call this Part I and only look at
> > numerics. Part II
> > > > > (if I
> > > > > can find the time to do it, and there's interest) will look at
> > > > strings,
> > > > > dates, times, and the other remaining XML Schema built-in
> > datatypes.
> > > > >
> > > > > I'd be grateful for (gentle!) feedback if anyone finds
> > I've made any
> > > > > egregious errors in the following. 95% of readers find
> > the following
> > > > > information correct 95% of the time. :-)
> > > > >
> > > > > Howard
> > > > >
> > > > > Value vs. General Comparisons
> > > > > ------------------------------------
> > > > > XQuery has both value (eg, ne, gt, lt, ge, le) comparisons [1] and
> > > > > general (=, !=, >, <, >=, <=) comparisons [2]. Value
> > comparisons are
> > > > > used for
> > > > > comparing singletons; general comparisons have their
> > raison d'etre in
> > > > > the
> > > > > requirements of XPath and are for comparing sequences.
> > Since sparql
> > > > > doesn't
> > > > > have sequences, and general comparisons generally devolve into
> > > > > item-by-item
> > > > > comparisons using value semantics anyway (with a few exceptions I
> > > > won't
> > > > > go
> > > > > into at the moment), I'm only going to look at value
> > comparisons here.
> > > > > (I
> > > > > might look at general comparisons in Part II if there
> > seems to be an
> > > > > interest; I haven't decided yet whether there is or isn't
> > something
> > > > > useful
> > > > > to be learned from that topic.)
> > > > >
> > > > > I'll note that while sparql appears to be using value
> > comparisons (ie,
> > > > > singletons only), it uses the operator symbol set from general
> > > > > comparisons.
> > > > > I think it's arguable whether this is good, bad, or
> > indifferent; if we
> > > > > wanted to be precise, we should probably be using the
> > > > value-comparisons
> > > > > symbol set , but since those symbols seem to be
> > fortran-based and thus
> > > > > likely to be viewed as a wondrous cosmological mystery by anybody
> > > > under
> > > > > the
> > > > > age of 50 or so, :-) I see no problem with using the more
> > familiar (=,
> > > > > !=,
> > > > > > , <, >=, <=) symbols. In the following where I'm
> > talking about value
> > > > > comparisons specifically, I'll use the proper value-comparison
> > > > operators
> > > > > from XQuery so as to (hopefully!) not further confuse the issue.
> > > > >
> > > > > Atomization
> > > > > --------------
> > > > > The first step in doing a comparison is atomization, in which each
> > > > > operand
> > > > > is reduced to a sequence of atomic values and types. In value
> > > > > comparisons,
> > > > > the atomized operands must be either singleton atomic
> > values or the
> > > > > empty (null) sequence. Atomizing the literal "2" results
> > in a single
> > > > > value "2" of
> > > > > type string. Atomizing an element <e>2</e> without an accompanying
> > > > > schema
> > > > > results in a value "2" of type xdt:untypedAtomic. If
> > something ends up
> > > > > as xdt:untypedAtomic, it is treated as a string (value
> > comparisons do
> > > > > things
> > > > > slightly differently).
> > > > >
> > > > > After atomization:
> > > > >
> > > > > o If either operand is null, a null is returned.
> > > > > o If the cardinality of either operand is > 1, a type
> > error is thrown.
> > > > > o Otherwise an xs:boolean result is returned, showing the
> > results of
> > > > the
> > > > > comparison.
> > > > >
> > > > > Once the operands have been atomized, the proper
> > comparison function
> > > > > from
> > > > > the Binary Operator table in the Working Draft [3] needs to be
> > > > > identified
> > > > > for the two operands. Comparison functions operate on
> > "similar" types;
> > > > > if
> > > > > the types of the operands are too dissimilar, a type
> > error is thrown.
> > > > > What
> > > > > do I mean by "similar" and "dissimilar" (my own
> > terminology; not part
> > > > > of the
> > > > > formal specification)? Similarity means that both
> > operands must be of
> > > > > the
> > > > > same type to begin with or can be converted to be of the same type
> > > > > through
> > > > > either type promotion [4] or subtype substitution [5] (see below).
> > > > >
> > > > > First, here's a counter-example: strings and any form of
> > numeric are
> > > > > dissimilar. The query:
> > > > >
> > > > > 1 lt "2" => Saxon: "ERROR XPTY0004: Cannot
> > compare xs:integer
> > > > to
> > > > > xs:string"
> > > > >
> > > > > throws a type error because strings can't be compared to numerics.
> > > > >
> > > > > Numeric comparisons
> > > > > -------------------------
> > > > > On the other hand,
> > > > >
> > > > > 1 lt 2.0 => false
> > > > >
> > > > > compares a numeric (xs:integer) against a numeric
> > (xs:decimal) using
> > > > the
> > > > > numeric comparison function, op:numeric-less-than( a, b ).
> > > > >
> > > > > Numeric comparisons allow the greatest degree of operand
> > > > dissimilarity,
> > > > > since there are actually sixteen numeric subtypes in the
> > XML Schema
> > > > > built-in
> > > > > datatypes hierarchy [6] that can be passed in as
> > arguments to numeric
> > > > > functions such as op:numeric-less-than() above.
> > > > >
> > > > > There are actually four sub-varieties of numeric
> > functions per each
> > > > op:
> > > > > function: one version to handle floats, one to handle
> > doubles, one to
> > > > > handle
> > > > > decimals, and one to handle integers. If other datatypes are to be
> > > > > compared,
> > > > > they need to be converted to one of these four types first. The
> > > > > algorithm
> > > > > for doing the conversion can be presented as a multiway "if"
> > > > statement.
> > > > > Assuming that both operands are numeric:
> > > > >
> > > > > if either of the two operands is of type float
> > > > > convert the other to float and call the appropriate
> > > > > compare-floats()
> > > > > function
> > > > > else if either of the operands is of type double
> > > > > convert the other to double and call the appropriate
> > > > > compare-doubles() function
> > > > > else if either of the operands is of type decimal
> > > > > convert the other to decimal and call the appropriate
> > > > > compare-decimals() function
> > > > > else
> > > > > convert both to integer (if necessary) and call
> > the appropriate
> > > > > compare-integers() function
> > > > >
> > > > > In the case of
> > > > >
> > > > > 1 lt 2.0
> > > > >
> > > > > for example (xs:integer vs xs:decimal), the
> > compare-decimals() version
> > > > > of op:numeric-less-than() ends up getting called.
> > > > >
> > > > > Type Promotion
> > > > > --------------------
> > > > > The word "converts" in the algorithm refers to both the
> > mechanisms of
> > > > > type
> > > > > promotion [4] and subtype substitution [5], depending on what the
> > > > > source and
> > > > > target numeric types are. If a double is being converted
> > to a float,
> > > > for
> > > > > example, type promotion is used. Decimals (or any type
> > derived from
> > > > > decimal)
> > > > > can also be promoted to either double or float. (I find the term
> > > > > "promotion"
> > > > > here a bit misleading when talking about doubles and
> > floats, since to
> > > > me
> > > > > promotion seems to imply movement or casting "up" a type
> > hierarchy.
> > > > > Floats
> > > > > and doubles however are at the same level in the XML
> > Schema built-in
> > > > > type
> > > > > hierarchy [6], and neither is superior or subordinate to
> > the other in
> > > > > terms
> > > > > of derivation.)
> > > > >
> > > > > There's a second variety of type promotion in XQuery
> > where any value
> > > > of
> > > > > type
> > > > > xs:anyURI can be promoted to string, so that any operator that
> > > > compares
> > > > > strings can take an xs:anyURI type of argument.
> > > > >
> > > > > Subtype Substitution
> > > > > -------------------------
> > > > > Subtype substitution results when a subtype is used where its
> > > > supertype
> > > > > is
> > > > > required. In the last branch of the above "if" statement,
> > any numeric
> > > > > type
> > > > > that's subordinate to xs:decimal can be used as an argument to the
> > > > > appropriate compare-decimals() function, and anything
> > subordinate to
> > > > > xs:integer can be used as an argument to compare-integers(). For
> > > > > example, in
> > > > > the comparison
> > > > >
> > > > > xs:nonPositiveInteger( -1 ) < xs:nonNegativeInteger( 1 )
> > > > >
> > > > > both operands are passed into the compare-integers() version of
> > > > > op:op:numeric-less-than() and compared as integers
> > without casting.
> > > > > xs:integer is the lowest common type that superordinate to both
> > > > > nonPositiveInteger and nonNegativeInteger.
> > > > >
> > > > > Depending on the types of the operands, several
> > conversions might be
> > > > > done
> > > > > internally to bring both operands to a common (or
> > similar) type. For
> > > > > example,
> > > > >
> > > > > xs:double( 3.14159e0 ) lt xs:short( 4 ) => true
> > > > >
> > > > > first up-converts the xs:short to xs:decimal using subtype
> > > > subsitution,
> > > > > and
> > > > > then type-promotes the xs:decimal to xs:double, so that the
> > > > > compare-doubles() version of op:numeric-less-than() can
> > be used. The
> > > > > implementation needn't go through all the intermediate
> > stages if it
> > > > can
> > > > > be
> > > > > done more directly (I believe).
> > > > >
> > > > > That's it for the moment ...
> > > > >
> > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > ----
> > > > > ---
> > > > > [1] http://www.w3.org/TR/xquery/#id-value-comparisons
> > > > > [2] http://www.w3.org/TR/xquery/#id-general-comparisons
> > > > > [3] http://www.w3.org/TR/xquery/#mapping
> > > > > [4] http://www.w3.org/TR/xquery/#dt-type-promotion
> > > > > [5] http://www.w3.org/TR/xquery/#dt-subtype-substitution
> > > > > [6] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
> > > >
> > >
> > >
> > >
> >
> >
>
>
--
-eric
office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
Shonan Fujisawa Campus, Keio University,
5322 Endo, Fujisawa, Kanagawa 252-8520
JAPAN
+1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell: +81.90.6533.3882
(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Thursday, 21 April 2005 00:15:36 UTC