Re: my action item from Seaborne, Andy on 2006-08-07 (public-rdf-dawg@w3.org from July to September 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 07 Aug 2006 12:55:08 +0100
To: Eric Prud'hommeaux <eric@w3.org>
CC: Pat Hayes <phayes@ihmc.us>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <44D72A1C.70605@hp.com>
Eric Prud'hommeaux wrote:
> On Fri, Aug 04, 2006 at 10:48:34AM +0100, Seaborne, Andy wrote:
>> How about a scheme like this for comparison of literals:
>>
>> 1/ Be explicit about value spaces; the design is comparison by-value.
>>
>> All operators return true if the implementation positively knows that the 
>> two values compare as needed, return false if the implementation positively 
>> knows that the two value do not compare as needed and returns error if it 
>> does not know.
>>
>> http://www.w3.org/TR/xmlschema-2/#value-space
>>
>> 2/ Define sop:value-compare(A, B) to be -1, 0 , 1 or error depending on 
>> whether A and B are less than, equal, greater than, or it's an unknown 
>> comparison.
>>
>> Note that sop:value-compare can be partial.  A processors always knows A = 
>> B without much else if the lexical forms and datatypes match.
>>
>> 3/ Define =, !=, <, <= , > , >= to be the relevant result(s) of 
>> value-compare
>>
>> 4/ State which datatypes that are required for a SPARQL engine (this could 
>> even be less than the current set; xsd:int but not arbitrary length 
>> integers; no decimals, or no dateTime which are a bit larger in 
>>  implementation costs).
>>
>> 5/ Show that value-compare maps to the "XPath Tests" table for the 
>> operators where an implementation provides them.
> 
> I found it more intuitive to use the XPath tests directly. Proposal below.

I'm not sure what your position is with respect to points 1-4.  Is the 
proposal below about how to express the use of F&O tests for sop:value-compare 
for the known datatypes or is that text the only mapping from grammar items to 
F&O?

If it's the latter, the proposal is as before isn't it? - introduce 
"sameLiteral" and reserve the syntax of infix operators for a fixed set of 
datatypes?  Including "="?

More below.

> 
>> 6/ = and != can be defined on non-literals be RDFterm-equals as currently.
>>
>> In terms of text change and test change and implementation impact, this is 
>> actually quite a small change because it exactly agrees on the fixed set of 
>> datatypes we already have.  It just permits extensibility through the 
>> principle is value testing.
>>
>> An implementation can provide more datatypes as it chooses, meeting the 
>> "Extensible Value Testing".  It is explicitly monotonic in the capabilities 
>> of the processor.  But now legacy or other standards for datatypes can be 
>> added smoothly (e.g. ISO 8601 date and time which is not exactly the same 
>> as XSD dateTime).
>>
>> 	Andy
>>
>>
>>
>> Pat Hayes wrote:
>>>> On Tue, Aug 01, 2006 at 11:19:45AM -0700, Pat Hayes wrote:
>>>>> Re. my action item from today's telecon.
>>>>>
>>>>> After looking at Andy's examples in
>>>>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0104.html
>>>>> more closely, his example 6 seems to behave correctly for the issue
>>>>> that you were raising, if I understand it properly. In which case no
>>>>> further examples are needed, and my action item is moot.
>>>>>
>>>>> So let me see if I have got this right.
>>>>>
>>>>> My understanding of your concern was that we had a nonmonotonic
>>>>> situation because a not-equal ( !=) filter, as in example 6, behaved
>>>>> as follows: when faced with an unknown datatype, it would revert to a
>>>>> string-not-equal test on the literal string, and so succeed when the
>>>>> literal strings were distinct but the type URI matches; and then this
>>>>> success might transform to a failure when better datatyping
>>>>> information is available.
>>>> Our measure of monotinicity is that adding knowledge to the system
>>>> does not cause us to rescind conclusions. We should never get answers
>>> >from the naive implementation that we don't get from the omniscient
>>>> one (adding support for a datatype should not cause us to rescind
>>>> answers).
>>> Agreed.
>>>
>>>> The current text in rq2{3,4} has:
>>>>
>>>> [[
>>>> When selecting the operator definition for a given set of parameters,
>>>> the definition with the most specific parameters applies. For
>>>> instance, when evaluating xsd:integer = xsd:signedInt, the definition
>>>> for = with two numeric parameters applies, rather than the one with
>>>> two RDF terms. The table is arranged so that upper-most viable
>>>> candiate is the most specific.
>>>> ...
>>>> A != B	numeric	      numeric	    fn:not(op:numeric-equal(A, B))
>>>> A != B	xsd:boolean   xsd:boolean   fn:not(op:boolean-equal(A, B))
>>>> A != B	xsd:dateTime  xsd:dateTime  fn:not(op:dateTime-equal(A, B))
>>>> ...
>>>> A != B	RDF term      RDF term	    fn:not(RDFterm-equal(A, B))
> 
> PROPOSED: change the paragraph to:
> 
> [[
> 11.3 Operator Mapping
> 
> The SPARQL grammar identifies a set of operators (for instance, &&, *,
> isIRI) used to construct constraints. The following table associates
> each of these grammatical productions with the appropriate opperands
> and an operator function defined either by XPath or the SPARQL
> operators specified in section 11.4. Operators invoked without
> appropriate operators result in a type error.

This paragraph precludes adding new datatypes.  There are quite a few date 
representations around so I think it is desirable to have a scheme that allows 
existing data to be mapped to RDF and used with SPARQL without needing to a 
transformation of datatypes.

> 
> SPARQL follows XPath's scheme for numeric type promotions and subtype
> substitution for arguments to numeric operators. The XPath Operator
> Mapping rules for operands of type {xs:integer, xs:decimal, xs:float,
> xs:double} or any derivative types apply to SPARQL operators as well.
> For instance, when evaluating xsd:integer = xsd:signedInt, the
> definition for = with two numeric parameters applies.  Some of the
> operators are associated with nested function expressions,
> e.g. fn:not(op:numeric-equal(A, B)). Note that per the XPath
> definitions, fn:not and op:numeric-equal produce an error if their
> argument is an error.
> ]]
> 
> This is a little more formal than the current text, but does not go so
> far as to quote the relevent 3 paragraphs and table from
> http://www.w3.org/TR/xpath20/#mapping .
> 
> We also need to change the operators for RDFterm-equal(A, B):
> ('-' and '+' indicate rows removed from, and added to the table.)
> [[
> - A = B		       RDF term	  RDF term   RDFterm-equal(A, B)
> - A != B	       RDF term	  RDF term   fn:not(RDFterm-equal(A, B))
> + A = B		       IRI	  IRI	     RDFterm-equal(A, B)
> + A = B		       blank node blank node RDFterm-equal(A, B)
> + A != B	       IRI	  IRI	     RDFterm-equal(A, B)
> + A != B	       blank node blank node RDFterm-equal(A, B)

fn:not(RDFterm-equal(A, B)) ?

> + sameLITERAL(A, B)    literal   literal     RDFterm-equal(A, B)
> ]]
> 
> and in grammar rule 57, we add:
> [[
> 	| 'sameLITERAL' '(' Expression ',' Expression ')' 
> ]]
> 
> I think this is what Andy meant by "Be explicit about value spaces".

Not really.  I don't see any discussion of value spaces.  If those are the 
only changes, the text still precludes adding understanding of a new datatype 
(examples: xsd:date of ISO 8601 date and time) and having the syntax for "<" 
work.  A design that does work for "<" can also work for "=" and "!="

Examples:

SELECT ?x { ?x :p ?v } ORDER BY ?v
   or
SELECT ?x { ?x :p ?v . ?x :q ?w . FILTER ( ?v < ?w ) }

for ?v, ?w being xsd:dates or ISO 8601 dates (including mixed).

The fact that

SELECT ?x { ?x :p ?v . ?x :q ?w . FILTER ( ?v = ?w ) }

work only for a fixed set of types is rather confusing.


The point about being explicit about value spaces is make the set of datatypes 
a processor can provide be an open set, not a closed set, if it follows the 
principle that a processor tests based on compatible value spaces and only 
returns true/false if it definitely knows that relationship to be true/false. 
  That open set can be smaller or larger than the current list.

   "<" maps to sop:value-compare(A,B) == -1
and
   sop:value-compare(numeric,numeric) == -1
maps to op:numeric-less-than(A, B)  (well - fn:numeric-compare would be nice!)

 > I
> have not doen a survey of what tests would need to change; any that do
> a positive literal comparison based on an invocation of RDFterm-equal
> (as currently implied by '=' or '!=' on literals that are not both of
> known data types).
> 
> 
> In general, I think this text follows XPath nicely, and provides more
> clarity for implementors. Users have to get over having to explicitly
> state when they want tests to not depend on data type support, but I
> think I've shown that that is necessary to meet our monotonicity
> constraints (though I didn't include "Q.E.D." as PatH had advised).

Sorry - I don't understand that.  Why is it necessary to have sameLiteral to 
meet the monotonicity goal, rather than being one way to meet the monotonicity 
goal?

Specifically, in what way does the use of sop:value-compare, which is defined 
to be an error when the comparison can not be done on some pair of values, 
violate monotonicity?  What's the counter-example?


For the Roman numeral example below,

   sop:value-compare("II", 2) is either true or error.

That means that applying fn:not will not come up with a non-monotonic 
situation because fn:not(error) is still an error and an error means something 
is not included in the result set.


[
By the way:

http://www.w3.org/TR/xmlschema-2/#decimal-lexical-representation
Decimal is limited to 0-9, ., + and - so Roman numerals can#t be a restriction.
]
	Andy

 >>>> The naive implementation sees
 >>>>  "2"^^xsd:integer != "II"^^roman:numeral
 >>>> and says "are they both numerics? no, boolean? no ... RDF terms? yes"
 >>>> and does the RDFterm-equal test. They are not the same term so the
 >>>> answer is TRUE (remember, *not* equal).
 >>> OK, I agree this is broken as written, but then this also seems to be
 >>> at odds with test 6 in that test suite. So I guess my point is,
 >>> regardless of what the spec currently says, those tests illustrate
 >>> what the right behavior OUGHT to be, which would be that a != between
 >>> two literals with unknown datatypes is simply unknown, and can never
 >>> succeed, regardless of the RDF term equality result between them. So,
 >>> reverting now to my very limited action item, I don't need to tweak
 >>> those tests or add to them in order to show what the result SHOULD
 >>> be. Right?
 >>>
 >>>> Some wise-guy adds support for roman:numeral to make the omniscient
 >>>> implementation from the following schema (note: restriction of decimal):
 >>>>
 >>>>  <xs:simpleType name="numeral" id="numeral">
 >>>>    <xs:restriction base="xs:decimal">
 >>>>      <xs:fractionDigits fixed="true" value="0"
 >>>> id="romanNumeral.fractionDigits"/>
 >>>>      <xs:pattern value="[IVDXLC]+"/>
 >>>>      <xs:minInclusive value="0" id="romanNumeral.minInclusive"/>
 >>>>    </xs:restriction>
 >>>>  </xs:simpleType>
 >>>>
 >>>> Now the implementation says "are they both decimals? yep" and returns
 >>>> FALSE (II is *not* != 2), causing us to lose an answer that we had in
 >>>> the naive implementation.
 >>>>
 >>>>
 >>>>> But this is not what the test examples indicate. With this rule, in
 >>>>> case #6, it would give the answer binding [ x/x1, v/"b"^^t:type1 ],
 >>>>> but in fact it does not: it gives no answers, as it should in order
 >>>>> to be monotonic when more datatype information is available. And the
 >>>>> comment on text 6 seems to  indicate that 'no result' is determined
 >>>>> in this case for reasons of preserving monotonicity, and works
 >>>>> symmetrically for equality and not-equality.
 >>>> I believe that this test does illustrate the problem. I can concoct a
 >>>> type system where the two are, in cleverer systems, known to be the
 >>>> same value.
 >>> Right, and in that case - following now the behavior indicated by the
 >>> example, not by the spec text you cite - the behavior will be
 >>> indistinguishable from what it is now (no answers) but if you instead
 >>> concoct a system in which they have different values, then the query
 >>> will succeed. So either way, we get monotonic behavior. Again, note I
 >>> am not following the first-line-in-table rule here, but the behavior
 >>> as specified in the test suite email: they give different results on
 >>> text 6.
 >>>
 >>> So, if we follow the rule as illustrated by test 6, which as I read
 >>> the test is that when either of A or B is typed with an unknown
 >>> datatype, then  A != B test always fails while A=B succeeds only when
 >>> A and B are the exact same literal string and same datatype URI, then
 >>> we don't need to do anything about extending the equality. Right?
 >>>
 >>> Pat
 >>>
 >>>> Therefor, we need to spell it
 >>>>
 >>>> SELECT *
 >>>> { ?x :p ?v
 >>>>     FILTER ( ?v !sameLiteral "a"^^t:type1 )
 >>>> }
 >>>>
 >>>> or something like this.
 >>>>
 >>>>> So, either the tests are OK, or I have misunderstood your point.
 >>>>>
 >>>>> Eric? Or indeed, anyone with anything useful to say?
 >>>>>
 >>>>> Pat
 >>>>> --
 >>>>> ---------------------------------------------------------------------
 >>>>> IHMC		(850)434 8903 or (650)494 3973   home
 >>>>> 40 South Alcaniz St.	(850)202 4416   office
 >>>>> Pensacola			(850)202 4440   fax
 >>>>> FL 32502			(850)291 0667    cell
 >>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
 >>>>>
 >>>> --
 >>>> -eric
 >>>>
 >>>> home-office: +1.617.395.1213 (usually 900-2300 CET)
 >>>> 	    +33.1.45.35.62.14
 >>>> cell:       +33.6.73.84.87.26
 >>>>
 >>>> (eric@w3.org)
 >>>> Feel free to forward this message to any list for any purpose other than
 >>>> email address distribution.
 >>>
 >
Received on Monday, 7 August 2006 11:55:48 UTC