Re: my action item from Seaborne, Andy on 2006-08-09 (public-rdf-dawg@w3.org from July to September 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 09 Aug 2006 10:04:11 +0100
To: Eric Prud'hommeaux <eric@w3.org>
CC: Pat Hayes <phayes@ihmc.us>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <44D9A50B.6040301@hp.com>
Eric Prud'hommeaux wrote:
> On Mon, Aug 07, 2006 at 12:55:08PM +0100, Seaborne, Andy wrote:
>>
>>
>> Eric Prud'hommeaux wrote:
>>> On Fri, Aug 04, 2006 at 10:48:34AM +0100, Seaborne, Andy wrote:
>>>> How about a scheme like this for comparison of literals:
>>>>
>>>> 1/ Be explicit about value spaces; the design is comparison by-value.
>>>>
>>>> All operators return true if the implementation positively knows that the 
>>>> two values compare as needed, return false if the implementation 
>>>> positively knows that the two value do not compare as needed and returns 
>>>> error if it does not know.
>>>>
>>>> http://www.w3.org/TR/xmlschema-2/#value-space
>>>>
>>>> 2/ Define sop:value-compare(A, B) to be -1, 0 , 1 or error depending on 
>>>> whether A and B are less than, equal, greater than, or it's an unknown 
>>>> comparison.
>>>>
>>>> Note that sop:value-compare can be partial.  A processors always knows A 
>>>> = B without much else if the lexical forms and datatypes match.
>>>>
>>>> 3/ Define =, !=, <, <= , > , >= to be the relevant result(s) of 
>>>> value-compare
>>>>
>>>> 4/ State which datatypes that are required for a SPARQL engine (this 
>>>> could even be less than the current set; xsd:int but not arbitrary length 
>>>> integers; no decimals, or no dateTime which are a bit larger in 
>>>> implementation costs).
>>>>
>>>> 5/ Show that value-compare maps to the "XPath Tests" table for the 
>>>> operators where an implementation provides them.
>>> I found it more intuitive to use the XPath tests directly. Proposal below.
>> I'm not sure what your position is with respect to points 1-4.  Is the 
>> proposal below about how to express the use of F&O tests for 
>> sop:value-compare for the known datatypes or is that text the only mapping 
>> from grammar items to F&O?
> 
> Let's see, what is our critical requirement:
>    monotonic WRT (returned) solutions
> 
> => no query string has a naive interpretation that yields a non-error
>    value and a clever interpretation that yields a different value.
> 
> => no operator can be mapped to two operator functions, depending on
>    the implementation, where the naive implementation yields a
>    non-error and the clever implementation yields a different value.
> 
> eg. "'II'^^roman:numeral = 2" must yield an error in the naive
>    implementation (or it could yeild false (unequal strings), but
>    *every* implementation would have to yield a false).
> 
> I don't see how sop:value-compare helps us. We really need the
> distinction in the query string, so that the same query can not be
> interpreted two ways.

value-compare('II'^^roman:numeral, 2) is error or EQ, meeting the first point 
above (using value-compare(A,B) to return one of error, LT, GT, NE, EQ)

Every operator maps to value-compare(A,B)

= and != are always value-testing for literals.
They are

We should retain the current table for mapping for the XSD datatypes mentioned 
because that covers the most common cases for users and for implementers.

> We could still handle the syntactic distinction separately and use
> this intermediate value-compare, but it seems easier for us and for
> consumers of the spec to map directly to F&O tests. For instance, each
> F&O test has a defined boolean result which means we don't have to
> invent a sop:value-compare return value that signifies != without
> indicating less than or greater than.
> 
>> If it's the latter, the proposal is as before isn't it? - introduce 
>> "sameLiteral" and reserve the syntax of infix operators for a fixed set of 
>> datatypes?  Including "="?
> 
> Exactly, though now I'm thinking that sameNode is more appropriate
> (explained below).

sameNode is a better name (it's definition is the same as RDFTermEquals).

> 
>> More below.

I think we covered the rest in the telecon.  If not, prompt me.

 From the telecon:

The table we had was:

          |    IRI   |   bNode  |  Literal
         -------------------------------
IRI      |  T or F  |    F     |    F
bNode    |    F     |  T or F  |    F
Literal  |    F     |    F     | E, T or F

literal/literal is a value comparison.

There is a function sameNode(A,B).  It is used to implement "=" and "!=" for 
IRI/IRI and bNode/bNode in the table above.  It is also callable as a function 
on literals where it is a syntactic test.

Fred suggested an isError(expression) operator that return T is the expression 
evaluates to an error, and is false otherwise.  This is to facilitate data 
validating applications.

	Andy

>>
>>>> 6/ = and != can be defined on non-literals be RDFterm-equals as currently.
>>>>
>>>> In terms of text change and test change and implementation impact, this 
>>>> is actually quite a small change because it exactly agrees on the fixed 
>>>> set of datatypes we already have.  It just permits extensibility through 
>>>> the principle is value testing.
>>>>
>>>> An implementation can provide more datatypes as it chooses, meeting the 
>>>> "Extensible Value Testing".  It is explicitly monotonic in the 
>>>> capabilities of the processor.  But now legacy or other standards for 
>>>> datatypes can be added smoothly (e.g. ISO 8601 date and time which is not 
>>>> exactly the same as XSD dateTime).
>>>>
>>>> 	Andy
>>>>
>>>>
>>>>
>>>> Pat Hayes wrote:
>>>>>> On Tue, Aug 01, 2006 at 11:19:45AM -0700, Pat Hayes wrote:
>>>>>>> Re. my action item from today's telecon.
>>>>>>>
>>>>>>> After looking at Andy's examples in
>>>>>>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0104.html
>>>>>>> more closely, his example 6 seems to behave correctly for the issue
>>>>>>> that you were raising, if I understand it properly. In which case no
>>>>>>> further examples are needed, and my action item is moot.
>>>>>>>
>>>>>>> So let me see if I have got this right.
>>>>>>>
>>>>>>> My understanding of your concern was that we had a nonmonotonic
>>>>>>> situation because a not-equal ( !=) filter, as in example 6, behaved
>>>>>>> as follows: when faced with an unknown datatype, it would revert to a
>>>>>>> string-not-equal test on the literal string, and so succeed when the
>>>>>>> literal strings were distinct but the type URI matches; and then this
>>>>>>> success might transform to a failure when better datatyping
>>>>>>> information is available.
>>>>>> Our measure of monotinicity is that adding knowledge to the system
>>>>>> does not cause us to rescind conclusions. We should never get answers
>>>>> >from the naive implementation that we don't get from the omniscient
>>>>>> one (adding support for a datatype should not cause us to rescind
>>>>>> answers).
>>>>> Agreed.
>>>>>
>>>>>> The current text in rq2{3,4} has:
>>>>>>
>>>>>> [[
>>>>>> When selecting the operator definition for a given set of parameters,
>>>>>> the definition with the most specific parameters applies. For
>>>>>> instance, when evaluating xsd:integer = xsd:signedInt, the definition
>>>>>> for = with two numeric parameters applies, rather than the one with
>>>>>> two RDF terms. The table is arranged so that upper-most viable
>>>>>> candiate is the most specific.
>>>>>> ...
>>>>>> A != B	numeric	      numeric	    fn:not(op:numeric-equal(A, B))
>>>>>> A != B	xsd:boolean   xsd:boolean   fn:not(op:boolean-equal(A, B))
>>>>>> A != B	xsd:dateTime  xsd:dateTime  fn:not(op:dateTime-equal(A, B))
>>>>>> ...
>>>>>> A != B	RDF term      RDF term	    fn:not(RDFterm-equal(A, B))
>>> PROPOSED: change the paragraph to:
>>>
>>> [[
>>> 11.3 Operator Mapping
>>>
>>> The SPARQL grammar identifies a set of operators (for instance, &&, *,
>>> isIRI) used to construct constraints. The following table associates
>>> each of these grammatical productions with the appropriate opperands
>>> and an operator function defined either by XPath or the SPARQL
>>> operators specified in section 11.4. Operators invoked without
>>> appropriate operators result in a type error.
>> This paragraph precludes adding new datatypes.  There are quite a few date 
>> representations around so I think it is desirable to have a scheme that 
>> allows existing data to be mapped to RDF and used with SPARQL without 
>> needing to a transformation of datatypes.
> 
> Agreed. This takes the conservative approach of defining a single
> SPARQL language, rather than a set of languages that have a minimal
> subset and have a prescribed behavoir for certain extensions. The
> question is whether the implementation that understands roman:numerals
> is still SPARQL, or just some benevolent extension.
> 
> At present, "implementation" occurs once in the normative part of the
> document:
> [[
> RDF Concepts and Abstract Syntax "anticipates an RFC on
> Internationalized Resource Identifiers. Implementations may issue
> warnings concerning the use of RDF URI References that do not conform
> with [IRI draft] or its successors."
> ]]
> 
> If we want to make SPARQL specifically embrace (or tollerate) certain
> extensions, we have lots of work to do outside of this text. Unless it
> is patently clear that this must affect our strategy, I propose that
> we handle it separately (i.e. wordsmith everything at the same time).  
> 
> For example, 11.1 Operand Data Types currently enumerates the
> datatypes in the SPARQL language. We'll have to scratch our heads and
> decide if we want to continue with DanC's don't-say-implementation
> approach, or start to define compatibility levels.
> 
> 
>>> SPARQL follows XPath's scheme for numeric type promotions and subtype
>>> substitution for arguments to numeric operators. The XPath Operator
>>> Mapping rules for operands of type {xs:integer, xs:decimal, xs:float,
>>> xs:double} or any derivative types apply to SPARQL operators as well.
>>> For instance, when evaluating xsd:integer = xsd:signedInt, the
>>> definition for = with two numeric parameters applies.  Some of the
>>> operators are associated with nested function expressions,
>>> e.g. fn:not(op:numeric-equal(A, B)). Note that per the XPath
>>> definitions, fn:not and op:numeric-equal produce an error if their
>>> argument is an error.
>>> ]]
>>>
>>> This is a little more formal than the current text, but does not go so
>>> far as to quote the relevent 3 paragraphs and table from
>>> http://www.w3.org/TR/xpath20/#mapping .
>>>
>>> We also need to change the operators for RDFterm-equal(A, B):
>>> ('-' and '+' indicate rows removed from, and added to the table.)
>>> [[
>>> - A = B		       RDF term	  RDF term   RDFterm-equal(A, B)
>>> - A != B	       RDF term	  RDF term   fn:not(RDFterm-equal(A, B))
>>> + A = B		       IRI	  IRI	     RDFterm-equal(A, B)
>>> + A = B		       blank node blank node RDFterm-equal(A, B)
>>> + A != B	       IRI	  IRI	     RDFterm-equal(A, B)
>>> + A != B	       blank node blank node RDFterm-equal(A, B)
>> fn:not(RDFterm-equal(A, B)) ?
> 
> yeah, what you said.
> 
>>> + sameLITERAL(A, B)    literal   literal     RDFterm-equal(A, B)
>>> ]]
>>>
>>> and in grammar rule 57, we add:
>>> [[
>>> 	| 'sameLITERAL' '(' Expression ',' Expression ')' 
>>> ]]
>>>
>>> I think this is what Andy meant by "Be explicit about value spaces".
>> Not really.  I don't see any discussion of value spaces.  If those are the 
>> only changes, the text still precludes adding understanding of a new 
>> datatype (examples: xsd:date of ISO 8601 date and time) and having the 
>> syntax for "<" work.  A design that does work for "<" can also work for "=" 
>> and "!="
>>
>> Examples:
>>
>> SELECT ?x { ?x :p ?v } ORDER BY ?v
>>   or
>> SELECT ?x { ?x :p ?v . ?x :q ?w . FILTER ( ?v < ?w ) }
>>
>> for ?v, ?w being xsd:dates or ISO 8601 dates (including mixed).
>>
>> The fact that
>>
>> SELECT ?x { ?x :p ?v . ?x :q ?w . FILTER ( ?v = ?w ) }
>>
>> work only for a fixed set of types is rather confusing.
> 
> "Confusing"? or "limiting"...
> 
> 
>> The point about being explicit about value spaces is make the set of 
>> datatypes a processor can provide be an open set, not a closed set, if it 
>> follows the principle that a processor tests based on compatible value 
>> spaces and only returns true/false if it definitely knows that relationship 
>> to be true/false. That open set can be smaller or larger than the current 
>>  list.
>>
>>   "<" maps to sop:value-compare(A,B) == -1
>> and
>>   sop:value-compare(numeric,numeric) == -1
>> maps to op:numeric-less-than(A, B)  (well - fn:numeric-compare would be 
>> nice!)
> 
> Aha! If I understand, value-compare is not intended to solve the
> syntactic ambiguity of the current '=' and '!=' operators, but instead
> to provide an alternate operator function that is not constrained to
> existing XML Schema datatypes as the F&O operator functions are.
> 
> Am now quite convinced this is separate from the monotonicity issue.
> 
>>> I
>>> have not doen a survey of what tests would need to change; any that do
>>> a positive literal comparison based on an invocation of RDFterm-equal
>>> (as currently implied by '=' or '!=' on literals that are not both of
>>> known data types).
>>>
>>>
>>> In general, I think this text follows XPath nicely, and provides more
>>> clarity for implementors. Users have to get over having to explicitly
>>> state when they want tests to not depend on data type support, but I
>>> think I've shown that that is necessary to meet our monotonicity
>>> constraints (though I didn't include "Q.E.D." as PatH had advised).
>> Sorry - I don't understand that.  Why is it necessary to have sameLiteral 
>> to meet the monotonicity goal, rather than being one way to meet the 
>> monotonicity goal?
> 
> I meant that I had proven that users have to "explicitly state when
> they want tests to not depend on data type support." This is handled
> by the operator mapping and the change to the grammar. There are other
> ways to change the operator mapping and grammar to meet this goal.
> 
>> Specifically, in what way does the use of sop:value-compare, which is 
>> defined to be an error when the comparison can not be done on some pair of 
>> values, violate monotonicity?  What's the counter-example?
>>
>> For the Roman numeral example below,
>>
>>   sop:value-compare("II", 2) is either true or error.
>>
>> That means that applying fn:not will not come up with a non-monotonic 
>> situation because fn:not(error) is still an error and an error means 
>> something is not included in the result set.
> 
> But the person is not typing "sop:value-compare('II', 2)", they are
> typing "'II' = 2", which, unless we make the syntactic distinction I
> proposed, could mean RDFterm-equal('II', 2), which sould simply be
> false in the naive case, and true in the more clever case.
> 
>> [
>> By the way:
>>
>> http://www.w3.org/TR/xmlschema-2/#decimal-lexical-representation
>> Decimal is limited to 0-9, ., + and - so Roman numerals can#t be a 
>> restriction.
>> ]
> 
> rats! it is such a handy example. For the purposes of our discusion,
> let's ignore this.
> 
>> 	Andy
>>
>>>>>> The naive implementation sees
>>>>>>  "2"^^xsd:integer != "II"^^roman:numeral
>>>>>> and says "are they both numerics? no, boolean? no ... RDF terms? yes"
>>>>>> and does the RDFterm-equal test. They are not the same term so the
>>>>>> answer is TRUE (remember, *not* equal).
>>>>> OK, I agree this is broken as written, but then this also seems to be
>>>>> at odds with test 6 in that test suite. So I guess my point is,
>>>>> regardless of what the spec currently says, those tests illustrate
>>>>> what the right behavior OUGHT to be, which would be that a != between
>>>>> two literals with unknown datatypes is simply unknown, and can never
>>>>> succeed, regardless of the RDF term equality result between them. So,
>>>>> reverting now to my very limited action item, I don't need to tweak
>>>>> those tests or add to them in order to show what the result SHOULD
>>>>> be. Right?
>>>>>
>>>>>> Some wise-guy adds support for roman:numeral to make the omniscient
>>>>>> implementation from the following schema (note: restriction of 
>> decimal):
>>>>>>  <xs:simpleType name="numeral" id="numeral">
>>>>>>    <xs:restriction base="xs:decimal">
>>>>>>      <xs:fractionDigits fixed="true" value="0"
>>>>>> id="romanNumeral.fractionDigits"/>
>>>>>>      <xs:pattern value="[IVDXLC]+"/>
>>>>>>      <xs:minInclusive value="0" id="romanNumeral.minInclusive"/>
>>>>>>    </xs:restriction>
>>>>>>  </xs:simpleType>
>>>>>>
>>>>>> Now the implementation says "are they both decimals? yep" and returns
>>>>>> FALSE (II is *not* != 2), causing us to lose an answer that we had in
>>>>>> the naive implementation.
>>>>>>
>>>>>>
>>>>>>> But this is not what the test examples indicate. With this rule, in
>>>>>>> case #6, it would give the answer binding [ x/x1, v/"b"^^t:type1 ],
>>>>>>> but in fact it does not: it gives no answers, as it should in order
>>>>>>> to be monotonic when more datatype information is available. And the
>>>>>>> comment on text 6 seems to  indicate that 'no result' is determined
>>>>>>> in this case for reasons of preserving monotonicity, and works
>>>>>>> symmetrically for equality and not-equality.
>>>>>> I believe that this test does illustrate the problem. I can concoct a
>>>>>> type system where the two are, in cleverer systems, known to be the
>>>>>> same value.
>>>>> Right, and in that case - following now the behavior indicated by the
>>>>> example, not by the spec text you cite - the behavior will be
>>>>> indistinguishable from what it is now (no answers) but if you instead
>>>>> concoct a system in which they have different values, then the query
>>>>> will succeed. So either way, we get monotonic behavior. Again, note I
>>>>> am not following the first-line-in-table rule here, but the behavior
>>>>> as specified in the test suite email: they give different results on
>>>>> text 6.
>>>>>
>>>>> So, if we follow the rule as illustrated by test 6, which as I read
>>>>> the test is that when either of A or B is typed with an unknown
>>>>> datatype, then  A != B test always fails while A=B succeeds only when
>>>>> A and B are the exact same literal string and same datatype URI, then
>>>>> we don't need to do anything about extending the equality. Right?
>>>>>
>>>>> Pat
>>>>>
>>>>>> Therefor, we need to spell it
>>>>>>
>>>>>> SELECT *
>>>>>> { ?x :p ?v
>>>>>>     FILTER ( ?v !sameLiteral "a"^^t:type1 )
>>>>>> }
>>>>>>
>>>>>> or something like this.
>>>>>>
>>>>>>> So, either the tests are OK, or I have misunderstood your point.
>>>>>>>
>>>>>>> Eric? Or indeed, anyone with anything useful to say?
>>>>>>>
>>>>>>> Pat
>>>>>>> --
>>>>>>> ---------------------------------------------------------------------
>>>>>>> IHMC		(850)434 8903 or (650)494 3973   home
>>>>>>> 40 South Alcaniz St.	(850)202 4416   office
>>>>>>> Pensacola			(850)202 4440   fax
>>>>>>> FL 32502			(850)291 0667    cell
>>>>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>>>>>
>>>>>> --
>>>>>> -eric
>>>>>>
>>>>>> home-office: +1.617.395.1213 (usually 900-2300 CET)
>>>>>> 	    +33.1.45.35.62.14
>>>>>> cell:       +33.6.73.84.87.26
>>>>>>
>>>>>> (eric@w3.org)
>>>>>> Feel free to forward this message to any list for any purpose other 
>> than
>>>>>> email address distribution.
>
Received on Wednesday, 9 August 2006 09:04:34 UTC