Re: my action item from Seaborne, Andy on 2006-08-04 (public-rdf-dawg@w3.org from July to September 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 04 Aug 2006 10:48:34 +0100
To: Pat Hayes <phayes@ihmc.us>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
CC: Eric Prud'hommeaux <eric@w3.org>
Message-ID: <44D317F2.2080006@hp.com>
How about a scheme like this for comparison of literals:

1/ Be explicit about value spaces; the design is comparison by-value.

All operators return true if the implementation positively knows that the two 
values compare as needed, return false if the implementation positively knows 
that the two value do not compare as needed and returns error if it does not know.

http://www.w3.org/TR/xmlschema-2/#value-space

2/ Define sop:value-compare(A, B) to be -1, 0 , 1 or error depending on 
whether A and B are less than, equal, greater than, or it's an unknown comparison.

Note that sop:value-compare can be partial.  A processors always knows A = B 
without much else if the lexical forms and datatypes match.

3/ Define =, !=, <, <= , > , >= to be the relevant result(s) of value-compare

4/ State which datatypes that are required for a SPARQL engine (this could 
even be less than the current set; xsd:int but not arbitrary length integers; 
  no decimals, or no dateTime which are a bit larger in implementation costs).

5/ Show that value-compare maps to the "XPath Tests" table for the operators 
where an implementation provides them.

6/ = and != can be defined on non-literals be RDFterm-equals as currently.

In terms of text change and test change and implementation impact, this is 
actually quite a small change because it exactly agrees on the fixed set of 
datatypes we already have.  It just permits extensibility through the 
principle is value testing.

An implementation can provide more datatypes as it chooses, meeting the 
"Extensible Value Testing".  It is explicitly monotonic in the capabilities of 
the processor.  But now legacy or other standards for datatypes can be added 
smoothly (e.g. ISO 8601 date and time which is not exactly the same as XSD 
dateTime).

	Andy



Pat Hayes wrote:
>> On Tue, Aug 01, 2006 at 11:19:45AM -0700, Pat Hayes wrote:
>>>  Re. my action item from today's telecon.
>>>
>>>  After looking at Andy's examples in
>>>  http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0104.html
>>>  more closely, his example 6 seems to behave correctly for the issue
>>>  that you were raising, if I understand it properly. In which case no
>>>  further examples are needed, and my action item is moot.
>>>
>>>  So let me see if I have got this right.
>>>
>>>  My understanding of your concern was that we had a nonmonotonic
>>>  situation because a not-equal ( !=) filter, as in example 6, behaved
>>>  as follows: when faced with an unknown datatype, it would revert to a
>>>  string-not-equal test on the literal string, and so succeed when the
>>>  literal strings were distinct but the type URI matches; and then this
>>>  success might transform to a failure when better datatyping
>>>  information is available.
>> Our measure of monotinicity is that adding knowledge to the system
>> does not cause us to rescind conclusions. We should never get answers
>>from the naive implementation that we don't get from the omniscient
>> one (adding support for a datatype should not cause us to rescind
>> answers).
> 
> Agreed.
> 
>>  The current text in rq2{3,4} has:
>>
>> [[
>> When selecting the operator definition for a given set of parameters,
>> the definition with the most specific parameters applies. For
>> instance, when evaluating xsd:integer = xsd:signedInt, the definition
>> for = with two numeric parameters applies, rather than the one with
>> two RDF terms. The table is arranged so that upper-most viable
>> candiate is the most specific.
>> ...
>> A != B	numeric	      numeric	    fn:not(op:numeric-equal(A, B))
>> A != B	xsd:boolean   xsd:boolean   fn:not(op:boolean-equal(A, B))
>> A != B	xsd:dateTime  xsd:dateTime  fn:not(op:dateTime-equal(A, B))
>> ...
>> A != B	RDF term      RDF term	    fn:not(RDFterm-equal(A, B))
>>
>> The naive implementation sees
>>   "2"^^xsd:integer != "II"^^roman:numeral
>> and says "are they both numerics? no, boolean? no ... RDF terms? yes"
>> and does the RDFterm-equal test. They are not the same term so the
>> answer is TRUE (remember, *not* equal).
> 
> OK, I agree this is broken as written, but then this also seems to be 
> at odds with test 6 in that test suite. So I guess my point is, 
> regardless of what the spec currently says, those tests illustrate 
> what the right behavior OUGHT to be, which would be that a != between 
> two literals with unknown datatypes is simply unknown, and can never 
> succeed, regardless of the RDF term equality result between them. So, 
> reverting now to my very limited action item, I don't need to tweak 
> those tests or add to them in order to show what the result SHOULD 
> be. Right?
> 
>> Some wise-guy adds support for roman:numeral to make the omniscient
>> implementation from the following schema (note: restriction of decimal):
>>
>>   <xs:simpleType name="numeral" id="numeral">
>>     <xs:restriction base="xs:decimal">
>>       <xs:fractionDigits fixed="true" value="0" 
>> id="romanNumeral.fractionDigits"/>
>>       <xs:pattern value="[IVDXLC]+"/>
>>       <xs:minInclusive value="0" id="romanNumeral.minInclusive"/>
>>     </xs:restriction>
>>   </xs:simpleType>
>>
>> Now the implementation says "are they both decimals? yep" and returns
>> FALSE (II is *not* != 2), causing us to lose an answer that we had in
>> the naive implementation.
>>
>>
>>>  But this is not what the test examples indicate. With this rule, in
>>>  case #6, it would give the answer binding [ x/x1, v/"b"^^t:type1 ],
>>>  but in fact it does not: it gives no answers, as it should in order
>>>  to be monotonic when more datatype information is available. And the
>>>  comment on text 6 seems to  indicate that 'no result' is determined
>>>  in this case for reasons of preserving monotonicity, and works
>>>  symmetrically for equality and not-equality.
>> I believe that this test does illustrate the problem. I can concoct a
>> type system where the two are, in cleverer systems, known to be the
>> same value.
> 
> Right, and in that case - following now the behavior indicated by the 
> example, not by the spec text you cite - the behavior will be 
> indistinguishable from what it is now (no answers) but if you instead 
> concoct a system in which they have different values, then the query 
> will succeed. So either way, we get monotonic behavior. Again, note I 
> am not following the first-line-in-table rule here, but the behavior 
> as specified in the test suite email: they give different results on 
> text 6.
> 
> So, if we follow the rule as illustrated by test 6, which as I read 
> the test is that when either of A or B is typed with an unknown 
> datatype, then  A != B test always fails while A=B succeeds only when 
> A and B are the exact same literal string and same datatype URI, then 
> we don't need to do anything about extending the equality. Right?
> 
> Pat
> 
>> Therefor, we need to spell it
>>
>> SELECT *
>> { ?x :p ?v
>>      FILTER ( ?v !sameLiteral "a"^^t:type1 )
>> }
>>
>> or something like this.
>>
>>>  So, either the tests are OK, or I have misunderstood your point.
>>>
>>>  Eric? Or indeed, anyone with anything useful to say?
>>>
>>>  Pat
>>>  --
>>>  ---------------------------------------------------------------------
>>>  IHMC		(850)434 8903 or (650)494 3973   home
>>>  40 South Alcaniz St.	(850)202 4416   office
>>>  Pensacola			(850)202 4440   fax
>>>  FL 32502			(850)291 0667    cell
>>>  phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>
>> --
>> -eric
>>
>> home-office: +1.617.395.1213 (usually 900-2300 CET)
>> 	    +33.1.45.35.62.14
>> cell:       +33.6.73.84.87.26
>>
>> (eric@w3.org)
>> Feel free to forward this message to any list for any purpose other than
>> email address distribution.
> 
>
Received on Friday, 4 August 2006 09:49:07 UTC