Re: my action item

On Fri, Aug 04, 2006 at 10:48:34AM +0100, Seaborne, Andy wrote:
> 
> How about a scheme like this for comparison of literals:
> 
> 1/ Be explicit about value spaces; the design is comparison by-value.
> 
> All operators return true if the implementation positively knows that the 
> two values compare as needed, return false if the implementation positively 
> knows that the two value do not compare as needed and returns error if it 
> does not know.
> 
> http://www.w3.org/TR/xmlschema-2/#value-space
> 
> 2/ Define sop:value-compare(A, B) to be -1, 0 , 1 or error depending on 
> whether A and B are less than, equal, greater than, or it's an unknown 
> comparison.
> 
> Note that sop:value-compare can be partial.  A processors always knows A = 
> B without much else if the lexical forms and datatypes match.
> 
> 3/ Define =, !=, <, <= , > , >= to be the relevant result(s) of 
> value-compare
> 
> 4/ State which datatypes that are required for a SPARQL engine (this could 
> even be less than the current set; xsd:int but not arbitrary length 
> integers; no decimals, or no dateTime which are a bit larger in 
>  implementation costs).
> 
> 5/ Show that value-compare maps to the "XPath Tests" table for the 
> operators where an implementation provides them.

I found it more intuitive to use the XPath tests directly. Proposal below.

> 6/ = and != can be defined on non-literals be RDFterm-equals as currently.
> 
> In terms of text change and test change and implementation impact, this is 
> actually quite a small change because it exactly agrees on the fixed set of 
> datatypes we already have.  It just permits extensibility through the 
> principle is value testing.
> 
> An implementation can provide more datatypes as it chooses, meeting the 
> "Extensible Value Testing".  It is explicitly monotonic in the capabilities 
> of the processor.  But now legacy or other standards for datatypes can be 
> added smoothly (e.g. ISO 8601 date and time which is not exactly the same 
> as XSD dateTime).
> 
> 	Andy
> 
> 
> 
> Pat Hayes wrote:
> >>On Tue, Aug 01, 2006 at 11:19:45AM -0700, Pat Hayes wrote:
> >>> Re. my action item from today's telecon.
> >>>
> >>> After looking at Andy's examples in
> >>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0104.html
> >>> more closely, his example 6 seems to behave correctly for the issue
> >>> that you were raising, if I understand it properly. In which case no
> >>> further examples are needed, and my action item is moot.
> >>>
> >>> So let me see if I have got this right.
> >>>
> >>> My understanding of your concern was that we had a nonmonotonic
> >>> situation because a not-equal ( !=) filter, as in example 6, behaved
> >>> as follows: when faced with an unknown datatype, it would revert to a
> >>> string-not-equal test on the literal string, and so succeed when the
> >>> literal strings were distinct but the type URI matches; and then this
> >>> success might transform to a failure when better datatyping
> >>> information is available.
> >>Our measure of monotinicity is that adding knowledge to the system
> >>does not cause us to rescind conclusions. We should never get answers
> >>from the naive implementation that we don't get from the omniscient
> >>one (adding support for a datatype should not cause us to rescind
> >>answers).
> >
> >Agreed.
> >
> >> The current text in rq2{3,4} has:
> >>
> >>[[
> >>When selecting the operator definition for a given set of parameters,
> >>the definition with the most specific parameters applies. For
> >>instance, when evaluating xsd:integer = xsd:signedInt, the definition
> >>for = with two numeric parameters applies, rather than the one with
> >>two RDF terms. The table is arranged so that upper-most viable
> >>candiate is the most specific.
> >>...
> >>A != B	numeric	      numeric	    fn:not(op:numeric-equal(A, B))
> >>A != B	xsd:boolean   xsd:boolean   fn:not(op:boolean-equal(A, B))
> >>A != B	xsd:dateTime  xsd:dateTime  fn:not(op:dateTime-equal(A, B))
> >>...
> >>A != B	RDF term      RDF term	    fn:not(RDFterm-equal(A, B))

PROPOSED: change the paragraph to:

[[
11.3 Operator Mapping

The SPARQL grammar identifies a set of operators (for instance, &&, *,
isIRI) used to construct constraints. The following table associates
each of these grammatical productions with the appropriate opperands
and an operator function defined either by XPath or the SPARQL
operators specified in section 11.4. Operators invoked without
appropriate operators result in a type error.


SPARQL follows XPath's scheme for numeric type promotions and subtype
substitution for arguments to numeric operators. The XPath Operator
Mapping rules for operands of type {xs:integer, xs:decimal, xs:float,
xs:double} or any derivative types apply to SPARQL operators as well.
For instance, when evaluating xsd:integer = xsd:signedInt, the
definition for = with two numeric parameters applies.  Some of the
operators are associated with nested function expressions,
e.g. fn:not(op:numeric-equal(A, B)). Note that per the XPath
definitions, fn:not and op:numeric-equal produce an error if their
argument is an error.
]]

This is a little more formal than the current text, but does not go so
far as to quote the relevent 3 paragraphs and table from
http://www.w3.org/TR/xpath20/#mapping .

We also need to change the operators for RDFterm-equal(A, B):
('-' and '+' indicate rows removed from, and added to the table.)
[[
- A = B		       RDF term	  RDF term   RDFterm-equal(A, B)
- A != B	       RDF term	  RDF term   fn:not(RDFterm-equal(A, B))
+ A = B		       IRI	  IRI	     RDFterm-equal(A, B)
+ A = B		       blank node blank node RDFterm-equal(A, B)
+ A != B	       IRI	  IRI	     RDFterm-equal(A, B)
+ A != B	       blank node blank node RDFterm-equal(A, B)
+ sameLITERAL(A, B)    literal   literal     RDFterm-equal(A, B)
]]

and in grammar rule 57, we add:
[[
	| 'sameLITERAL' '(' Expression ',' Expression ')' 
]]

I think this is what Andy meant by "Be explicit about value spaces". I
have not doen a survey of what tests would need to change; any that do
a positive literal comparison based on an invocation of RDFterm-equal
(as currently implied by '=' or '!=' on literals that are not both of
known data types).


In general, I think this text follows XPath nicely, and provides more
clarity for implementors. Users have to get over having to explicitly
state when they want tests to not depend on data type support, but I
think I've shown that that is necessary to meet our monotonicity
constraints (though I didn't include "Q.E.D." as PatH had advised).

> >>The naive implementation sees
> >>  "2"^^xsd:integer != "II"^^roman:numeral
> >>and says "are they both numerics? no, boolean? no ... RDF terms? yes"
> >>and does the RDFterm-equal test. They are not the same term so the
> >>answer is TRUE (remember, *not* equal).
> >
> >OK, I agree this is broken as written, but then this also seems to be 
> >at odds with test 6 in that test suite. So I guess my point is, 
> >regardless of what the spec currently says, those tests illustrate 
> >what the right behavior OUGHT to be, which would be that a != between 
> >two literals with unknown datatypes is simply unknown, and can never 
> >succeed, regardless of the RDF term equality result between them. So, 
> >reverting now to my very limited action item, I don't need to tweak 
> >those tests or add to them in order to show what the result SHOULD 
> >be. Right?
> >
> >>Some wise-guy adds support for roman:numeral to make the omniscient
> >>implementation from the following schema (note: restriction of decimal):
> >>
> >>  <xs:simpleType name="numeral" id="numeral">
> >>    <xs:restriction base="xs:decimal">
> >>      <xs:fractionDigits fixed="true" value="0" 
> >>id="romanNumeral.fractionDigits"/>
> >>      <xs:pattern value="[IVDXLC]+"/>
> >>      <xs:minInclusive value="0" id="romanNumeral.minInclusive"/>
> >>    </xs:restriction>
> >>  </xs:simpleType>
> >>
> >>Now the implementation says "are they both decimals? yep" and returns
> >>FALSE (II is *not* != 2), causing us to lose an answer that we had in
> >>the naive implementation.
> >>
> >>
> >>> But this is not what the test examples indicate. With this rule, in
> >>> case #6, it would give the answer binding [ x/x1, v/"b"^^t:type1 ],
> >>> but in fact it does not: it gives no answers, as it should in order
> >>> to be monotonic when more datatype information is available. And the
> >>> comment on text 6 seems to  indicate that 'no result' is determined
> >>> in this case for reasons of preserving monotonicity, and works
> >>> symmetrically for equality and not-equality.
> >>I believe that this test does illustrate the problem. I can concoct a
> >>type system where the two are, in cleverer systems, known to be the
> >>same value.
> >
> >Right, and in that case - following now the behavior indicated by the 
> >example, not by the spec text you cite - the behavior will be 
> >indistinguishable from what it is now (no answers) but if you instead 
> >concoct a system in which they have different values, then the query 
> >will succeed. So either way, we get monotonic behavior. Again, note I 
> >am not following the first-line-in-table rule here, but the behavior 
> >as specified in the test suite email: they give different results on 
> >text 6.
> >
> >So, if we follow the rule as illustrated by test 6, which as I read 
> >the test is that when either of A or B is typed with an unknown 
> >datatype, then  A != B test always fails while A=B succeeds only when 
> >A and B are the exact same literal string and same datatype URI, then 
> >we don't need to do anything about extending the equality. Right?
> >
> >Pat
> >
> >>Therefor, we need to spell it
> >>
> >>SELECT *
> >>{ ?x :p ?v
> >>     FILTER ( ?v !sameLiteral "a"^^t:type1 )
> >>}
> >>
> >>or something like this.
> >>
> >>> So, either the tests are OK, or I have misunderstood your point.
> >>>
> >>> Eric? Or indeed, anyone with anything useful to say?
> >>>
> >>> Pat
> >>> --
> >>> ---------------------------------------------------------------------
> >>> IHMC		(850)434 8903 or (650)494 3973   home
> >>> 40 South Alcaniz St.	(850)202 4416   office
> >>> Pensacola			(850)202 4440   fax
> >>> FL 32502			(850)291 0667    cell
> >>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> >>>
> >>--
> >>-eric
> >>
> >>home-office: +1.617.395.1213 (usually 900-2300 CET)
> >>	    +33.1.45.35.62.14
> >>cell:       +33.6.73.84.87.26
> >>
> >>(eric@w3.org)
> >>Feel free to forward this message to any list for any purpose other than
> >>email address distribution.
> >
> >
> 

-- 
-eric

home-office: +1.617.395.1213 (usually 900-2300 CET)
	    +33.1.45.35.62.14
cell:       +33.6.73.84.87.26

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Monday, 7 August 2006 09:33:27 UTC