Re: my action item from Eric Prud'hommeaux on 2006-08-08 (public-rdf-dawg@w3.org from July to September 2006)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 8 Aug 2006 16:25:52 +0200
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: Pat Hayes <phayes@ihmc.us>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <20060808142552.GB6191@w3.org>
On Mon, Aug 07, 2006 at 12:55:08PM +0100, Seaborne, Andy wrote:
> 
> 
> 
> Eric Prud'hommeaux wrote:
> >On Fri, Aug 04, 2006 at 10:48:34AM +0100, Seaborne, Andy wrote:
> >>How about a scheme like this for comparison of literals:
> >>
> >>1/ Be explicit about value spaces; the design is comparison by-value.
> >>
> >>All operators return true if the implementation positively knows that the 
> >>two values compare as needed, return false if the implementation 
> >>positively knows that the two value do not compare as needed and returns 
> >>error if it does not know.
> >>
> >>http://www.w3.org/TR/xmlschema-2/#value-space
> >>
> >>2/ Define sop:value-compare(A, B) to be -1, 0 , 1 or error depending on 
> >>whether A and B are less than, equal, greater than, or it's an unknown 
> >>comparison.
> >>
> >>Note that sop:value-compare can be partial.  A processors always knows A 
> >>= B without much else if the lexical forms and datatypes match.
> >>
> >>3/ Define =, !=, <, <= , > , >= to be the relevant result(s) of 
> >>value-compare
> >>
> >>4/ State which datatypes that are required for a SPARQL engine (this 
> >>could even be less than the current set; xsd:int but not arbitrary length 
> >>integers; no decimals, or no dateTime which are a bit larger in 
> >> implementation costs).
> >>
> >>5/ Show that value-compare maps to the "XPath Tests" table for the 
> >>operators where an implementation provides them.
> >
> >I found it more intuitive to use the XPath tests directly. Proposal below.
> 
> I'm not sure what your position is with respect to points 1-4.  Is the 
> proposal below about how to express the use of F&O tests for 
> sop:value-compare for the known datatypes or is that text the only mapping 
> from grammar items to F&O?

Let's see, what is our critical requirement:
   monotonic WRT (returned) solutions

=> no query string has a naive interpretation that yields a non-error
   value and a clever interpretation that yields a different value.

=> no operator can be mapped to two operator functions, depending on
   the implementation, where the naive implementation yields a
   non-error and the clever implementation yields a different value.

eg. "'II'^^roman:numeral = 2" must yield an error in the naive
   implementation (or it could yeild false (unequal strings), but
   *every* implementation would have to yield a false).

I don't see how sop:value-compare helps us. We really need the
distinction in the query string, so that the same query can not be
interpreted two ways.

We could still handle the syntactic distinction separately and use
this intermediate value-compare, but it seems easier for us and for
consumers of the spec to map directly to F&O tests. For instance, each
F&O test has a defined boolean result which means we don't have to
invent a sop:value-compare return value that signifies != without
indicating less than or greater than.

> If it's the latter, the proposal is as before isn't it? - introduce 
> "sameLiteral" and reserve the syntax of infix operators for a fixed set of 
> datatypes?  Including "="?

Exactly, though now I'm thinking that sameNode is more appropriate
(explained below).

> More below.
> 
> >
> >>6/ = and != can be defined on non-literals be RDFterm-equals as currently.
> >>
> >>In terms of text change and test change and implementation impact, this 
> >>is actually quite a small change because it exactly agrees on the fixed 
> >>set of datatypes we already have.  It just permits extensibility through 
> >>the principle is value testing.
> >>
> >>An implementation can provide more datatypes as it chooses, meeting the 
> >>"Extensible Value Testing".  It is explicitly monotonic in the 
> >>capabilities of the processor.  But now legacy or other standards for 
> >>datatypes can be added smoothly (e.g. ISO 8601 date and time which is not 
> >>exactly the same as XSD dateTime).
> >>
> >>	Andy
> >>
> >>
> >>
> >>Pat Hayes wrote:
> >>>>On Tue, Aug 01, 2006 at 11:19:45AM -0700, Pat Hayes wrote:
> >>>>>Re. my action item from today's telecon.
> >>>>>
> >>>>>After looking at Andy's examples in
> >>>>>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0104.html
> >>>>>more closely, his example 6 seems to behave correctly for the issue
> >>>>>that you were raising, if I understand it properly. In which case no
> >>>>>further examples are needed, and my action item is moot.
> >>>>>
> >>>>>So let me see if I have got this right.
> >>>>>
> >>>>>My understanding of your concern was that we had a nonmonotonic
> >>>>>situation because a not-equal ( !=) filter, as in example 6, behaved
> >>>>>as follows: when faced with an unknown datatype, it would revert to a
> >>>>>string-not-equal test on the literal string, and so succeed when the
> >>>>>literal strings were distinct but the type URI matches; and then this
> >>>>>success might transform to a failure when better datatyping
> >>>>>information is available.
> >>>>Our measure of monotinicity is that adding knowledge to the system
> >>>>does not cause us to rescind conclusions. We should never get answers
> >>>>from the naive implementation that we don't get from the omniscient
> >>>>one (adding support for a datatype should not cause us to rescind
> >>>>answers).
> >>>Agreed.
> >>>
> >>>>The current text in rq2{3,4} has:
> >>>>
> >>>>[[
> >>>>When selecting the operator definition for a given set of parameters,
> >>>>the definition with the most specific parameters applies. For
> >>>>instance, when evaluating xsd:integer = xsd:signedInt, the definition
> >>>>for = with two numeric parameters applies, rather than the one with
> >>>>two RDF terms. The table is arranged so that upper-most viable
> >>>>candiate is the most specific.
> >>>>...
> >>>>A != B	numeric	      numeric	    fn:not(op:numeric-equal(A, B))
> >>>>A != B	xsd:boolean   xsd:boolean   fn:not(op:boolean-equal(A, B))
> >>>>A != B	xsd:dateTime  xsd:dateTime  fn:not(op:dateTime-equal(A, B))
> >>>>...
> >>>>A != B	RDF term      RDF term	    fn:not(RDFterm-equal(A, B))
> >
> >PROPOSED: change the paragraph to:
> >
> >[[
> >11.3 Operator Mapping
> >
> >The SPARQL grammar identifies a set of operators (for instance, &&, *,
> >isIRI) used to construct constraints. The following table associates
> >each of these grammatical productions with the appropriate opperands
> >and an operator function defined either by XPath or the SPARQL
> >operators specified in section 11.4. Operators invoked without
> >appropriate operators result in a type error.
> 
> This paragraph precludes adding new datatypes.  There are quite a few date 
> representations around so I think it is desirable to have a scheme that 
> allows existing data to be mapped to RDF and used with SPARQL without 
> needing to a transformation of datatypes.

Agreed. This takes the conservative approach of defining a single
SPARQL language, rather than a set of languages that have a minimal
subset and have a prescribed behavoir for certain extensions. The
question is whether the implementation that understands roman:numerals
is still SPARQL, or just some benevolent extension.

At present, "implementation" occurs once in the normative part of the
document:
[[
RDF Concepts and Abstract Syntax "anticipates an RFC on
Internationalized Resource Identifiers. Implementations may issue
warnings concerning the use of RDF URI References that do not conform
with [IRI draft] or its successors."
]]

If we want to make SPARQL specifically embrace (or tollerate) certain
extensions, we have lots of work to do outside of this text. Unless it
is patently clear that this must affect our strategy, I propose that
we handle it separately (i.e. wordsmith everything at the same time).  

For example, 11.1 Operand Data Types currently enumerates the
datatypes in the SPARQL language. We'll have to scratch our heads and
decide if we want to continue with DanC's don't-say-implementation
approach, or start to define compatibility levels.


> >
> >SPARQL follows XPath's scheme for numeric type promotions and subtype
> >substitution for arguments to numeric operators. The XPath Operator
> >Mapping rules for operands of type {xs:integer, xs:decimal, xs:float,
> >xs:double} or any derivative types apply to SPARQL operators as well.
> >For instance, when evaluating xsd:integer = xsd:signedInt, the
> >definition for = with two numeric parameters applies.  Some of the
> >operators are associated with nested function expressions,
> >e.g. fn:not(op:numeric-equal(A, B)). Note that per the XPath
> >definitions, fn:not and op:numeric-equal produce an error if their
> >argument is an error.
> >]]
> >
> >This is a little more formal than the current text, but does not go so
> >far as to quote the relevent 3 paragraphs and table from
> >http://www.w3.org/TR/xpath20/#mapping .
> >
> >We also need to change the operators for RDFterm-equal(A, B):
> >('-' and '+' indicate rows removed from, and added to the table.)
> >[[
> >- A = B		       RDF term	  RDF term   RDFterm-equal(A, B)
> >- A != B	       RDF term	  RDF term   fn:not(RDFterm-equal(A, B))
> >+ A = B		       IRI	  IRI	     RDFterm-equal(A, B)
> >+ A = B		       blank node blank node RDFterm-equal(A, B)
> >+ A != B	       IRI	  IRI	     RDFterm-equal(A, B)
> >+ A != B	       blank node blank node RDFterm-equal(A, B)
> 
> fn:not(RDFterm-equal(A, B)) ?

yeah, what you said.

> >+ sameLITERAL(A, B)    literal   literal     RDFterm-equal(A, B)
> >]]
> >
> >and in grammar rule 57, we add:
> >[[
> >	| 'sameLITERAL' '(' Expression ',' Expression ')' 
> >]]
> >
> >I think this is what Andy meant by "Be explicit about value spaces".
> 
> Not really.  I don't see any discussion of value spaces.  If those are the 
> only changes, the text still precludes adding understanding of a new 
> datatype (examples: xsd:date of ISO 8601 date and time) and having the 
> syntax for "<" work.  A design that does work for "<" can also work for "=" 
> and "!="
> 
> Examples:
> 
> SELECT ?x { ?x :p ?v } ORDER BY ?v
>   or
> SELECT ?x { ?x :p ?v . ?x :q ?w . FILTER ( ?v < ?w ) }
> 
> for ?v, ?w being xsd:dates or ISO 8601 dates (including mixed).
> 
> The fact that
> 
> SELECT ?x { ?x :p ?v . ?x :q ?w . FILTER ( ?v = ?w ) }
> 
> work only for a fixed set of types is rather confusing.

"Confusing"? or "limiting"...


> The point about being explicit about value spaces is make the set of 
> datatypes a processor can provide be an open set, not a closed set, if it 
> follows the principle that a processor tests based on compatible value 
> spaces and only returns true/false if it definitely knows that relationship 
> to be true/false. That open set can be smaller or larger than the current 
>  list.
> 
>   "<" maps to sop:value-compare(A,B) == -1
> and
>   sop:value-compare(numeric,numeric) == -1
> maps to op:numeric-less-than(A, B)  (well - fn:numeric-compare would be 
> nice!)

Aha! If I understand, value-compare is not intended to solve the
syntactic ambiguity of the current '=' and '!=' operators, but instead
to provide an alternate operator function that is not constrained to
existing XML Schema datatypes as the F&O operator functions are.

Am now quite convinced this is separate from the monotonicity issue.

> > I
> >have not doen a survey of what tests would need to change; any that do
> >a positive literal comparison based on an invocation of RDFterm-equal
> >(as currently implied by '=' or '!=' on literals that are not both of
> >known data types).
> >
> >
> >In general, I think this text follows XPath nicely, and provides more
> >clarity for implementors. Users have to get over having to explicitly
> >state when they want tests to not depend on data type support, but I
> >think I've shown that that is necessary to meet our monotonicity
> >constraints (though I didn't include "Q.E.D." as PatH had advised).
> 
> Sorry - I don't understand that.  Why is it necessary to have sameLiteral 
> to meet the monotonicity goal, rather than being one way to meet the 
> monotonicity goal?

I meant that I had proven that users have to "explicitly state when
they want tests to not depend on data type support." This is handled
by the operator mapping and the change to the grammar. There are other
ways to change the operator mapping and grammar to meet this goal.

> Specifically, in what way does the use of sop:value-compare, which is 
> defined to be an error when the comparison can not be done on some pair of 
> values, violate monotonicity?  What's the counter-example?
> 
> For the Roman numeral example below,
> 
>   sop:value-compare("II", 2) is either true or error.
> 
> That means that applying fn:not will not come up with a non-monotonic 
> situation because fn:not(error) is still an error and an error means 
> something is not included in the result set.

But the person is not typing "sop:value-compare('II', 2)", they are
typing "'II' = 2", which, unless we make the syntactic distinction I
proposed, could mean RDFterm-equal('II', 2), which sould simply be
false in the naive case, and true in the more clever case.

> [
> By the way:
> 
> http://www.w3.org/TR/xmlschema-2/#decimal-lexical-representation
> Decimal is limited to 0-9, ., + and - so Roman numerals can#t be a 
> restriction.
> ]

rats! it is such a handy example. For the purposes of our discusion,
let's ignore this.

> 	Andy
> 
> >>>> The naive implementation sees
> >>>>  "2"^^xsd:integer != "II"^^roman:numeral
> >>>> and says "are they both numerics? no, boolean? no ... RDF terms? yes"
> >>>> and does the RDFterm-equal test. They are not the same term so the
> >>>> answer is TRUE (remember, *not* equal).
> >>> OK, I agree this is broken as written, but then this also seems to be
> >>> at odds with test 6 in that test suite. So I guess my point is,
> >>> regardless of what the spec currently says, those tests illustrate
> >>> what the right behavior OUGHT to be, which would be that a != between
> >>> two literals with unknown datatypes is simply unknown, and can never
> >>> succeed, regardless of the RDF term equality result between them. So,
> >>> reverting now to my very limited action item, I don't need to tweak
> >>> those tests or add to them in order to show what the result SHOULD
> >>> be. Right?
> >>>
> >>>> Some wise-guy adds support for roman:numeral to make the omniscient
> >>>> implementation from the following schema (note: restriction of 
> decimal):
> >>>>
> >>>>  <xs:simpleType name="numeral" id="numeral">
> >>>>    <xs:restriction base="xs:decimal">
> >>>>      <xs:fractionDigits fixed="true" value="0"
> >>>> id="romanNumeral.fractionDigits"/>
> >>>>      <xs:pattern value="[IVDXLC]+"/>
> >>>>      <xs:minInclusive value="0" id="romanNumeral.minInclusive"/>
> >>>>    </xs:restriction>
> >>>>  </xs:simpleType>
> >>>>
> >>>> Now the implementation says "are they both decimals? yep" and returns
> >>>> FALSE (II is *not* != 2), causing us to lose an answer that we had in
> >>>> the naive implementation.
> >>>>
> >>>>
> >>>>> But this is not what the test examples indicate. With this rule, in
> >>>>> case #6, it would give the answer binding [ x/x1, v/"b"^^t:type1 ],
> >>>>> but in fact it does not: it gives no answers, as it should in order
> >>>>> to be monotonic when more datatype information is available. And the
> >>>>> comment on text 6 seems to  indicate that 'no result' is determined
> >>>>> in this case for reasons of preserving monotonicity, and works
> >>>>> symmetrically for equality and not-equality.
> >>>> I believe that this test does illustrate the problem. I can concoct a
> >>>> type system where the two are, in cleverer systems, known to be the
> >>>> same value.
> >>> Right, and in that case - following now the behavior indicated by the
> >>> example, not by the spec text you cite - the behavior will be
> >>> indistinguishable from what it is now (no answers) but if you instead
> >>> concoct a system in which they have different values, then the query
> >>> will succeed. So either way, we get monotonic behavior. Again, note I
> >>> am not following the first-line-in-table rule here, but the behavior
> >>> as specified in the test suite email: they give different results on
> >>> text 6.
> >>>
> >>> So, if we follow the rule as illustrated by test 6, which as I read
> >>> the test is that when either of A or B is typed with an unknown
> >>> datatype, then  A != B test always fails while A=B succeeds only when
> >>> A and B are the exact same literal string and same datatype URI, then
> >>> we don't need to do anything about extending the equality. Right?
> >>>
> >>> Pat
> >>>
> >>>> Therefor, we need to spell it
> >>>>
> >>>> SELECT *
> >>>> { ?x :p ?v
> >>>>     FILTER ( ?v !sameLiteral "a"^^t:type1 )
> >>>> }
> >>>>
> >>>> or something like this.
> >>>>
> >>>>> So, either the tests are OK, or I have misunderstood your point.
> >>>>>
> >>>>> Eric? Or indeed, anyone with anything useful to say?
> >>>>>
> >>>>> Pat
> >>>>> --
> >>>>> ---------------------------------------------------------------------
> >>>>> IHMC		(850)434 8903 or (650)494 3973   home
> >>>>> 40 South Alcaniz St.	(850)202 4416   office
> >>>>> Pensacola			(850)202 4440   fax
> >>>>> FL 32502			(850)291 0667    cell
> >>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> >>>>>
> >>>> --
> >>>> -eric
> >>>>
> >>>> home-office: +1.617.395.1213 (usually 900-2300 CET)
> >>>> 	    +33.1.45.35.62.14
> >>>> cell:       +33.6.73.84.87.26
> >>>>
> >>>> (eric@w3.org)
> >>>> Feel free to forward this message to any list for any purpose other 
> than
> >>>> email address distribution.
> >>>
> >
> 

-- 
-eric

home-office: +1.617.395.1213 (usually 900-2300 CET)
	    +33.1.45.35.62.14
cell:       +33.6.73.84.87.26

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Tuesday, 8 August 2006 14:24:46 UTC